# HG changeset patch # User Geoffrey Lefebvre # Date 1227835911 28800 # Node ID 87d43eab70d872ae30c9ff985c1b5ff5e986e843 # Parent f9d9b9ff4bbc35b78df31567375534173a032450 imported patch big_ring_linux_unstable.patch diff -r f9d9b9ff4bbc -r 87d43eab70d8 drivers/xen/Kconfig --- a/drivers/xen/Kconfig Wed Nov 26 17:00:33 2008 -0800 +++ b/drivers/xen/Kconfig Thu Nov 27 17:31:51 2008 -0800 @@ -184,6 +184,34 @@ dedicated device-driver domain, or your master control domain (domain 0), then you almost certainly want to say Y here. +choice + depends on XEN_BLKDEV_FRONTEND!=n + prompt "Number of pages used for the block device ring" + default 1_PAGE + help + Select the number of memory pages used to construct the + request/response ring between DomU and Dom0. A bigger ring can + provide higher throughput when used with a high performance backend + such as a iSCSI filer. If you are using single IDE/SATA drives for + the backend then increasing the number of pages will not increase + throughput. Using more than 1 page with a Dom0 that does not support + multi-page ring will fail at boot time. + + config 1_PAGE + bool "1 page" + config 2_PAGE + bool "2 pages" + config 4_PAGE + bool "4 pages" +endchoice + +config XEN_BLKDEV_NUM_RING_PAGES + int + default 1 if 1_PAGE + default 2 if 2_PAGE + default 4 if 4_PAGE + + config XEN_NETDEV_FRONTEND tristate "Network-device frontend driver" depends on NET diff -r f9d9b9ff4bbc -r 87d43eab70d8 drivers/xen/blkback/blkback.c --- a/drivers/xen/blkback/blkback.c Wed Nov 26 17:00:33 2008 -0800 +++ b/drivers/xen/blkback/blkback.c Thu Nov 27 17:31:51 2008 -0800 @@ -53,7 +53,7 @@ * This will increase the chances of being able to write whole tracks. * 64 should be enough to keep us competitive with Linux. */ -static int blkif_reqs = 64; +static int blkif_reqs = 128; module_param_named(reqs, blkif_reqs, int, 0); MODULE_PARM_DESC(reqs, "Number of blkback requests to allocate"); diff -r f9d9b9ff4bbc -r 87d43eab70d8 drivers/xen/blkback/common.h --- a/drivers/xen/blkback/common.h Wed Nov 26 17:00:33 2008 -0800 +++ b/drivers/xen/blkback/common.h Thu Nov 27 17:31:51 2008 -0800 @@ -47,6 +47,8 @@ #define DPRINTK(_f, _a...) \ pr_debug("(file=%s, line=%d) " _f, \ __FILE__ , __LINE__ , ## _a ) +#define IPRINTK(fmt, args...) \ + printk(KERN_INFO "%s: " fmt, __FUNCTION__, ##args) struct vbd { blkif_vdev_t handle; /* what the domain refers to this vbd as */ @@ -92,14 +94,15 @@ wait_queue_head_t waiting_to_free; - grant_handle_t shmem_handle; - grant_ref_t shmem_ref; + unsigned int num_ring_pages; + grant_handle_t shmem_handle[BLKIF_MAX_NUM_RING_PAGES]; + grant_ref_t shmem_ref[BLKIF_MAX_NUM_RING_PAGES]; } blkif_t; blkif_t *blkif_alloc(domid_t domid); void blkif_disconnect(blkif_t *blkif); void blkif_free(blkif_t *blkif); -int blkif_map(blkif_t *blkif, unsigned long shared_page, unsigned int evtchn); +int blkif_map(blkif_t *blkif, unsigned long *shared_pages, unsigned int evtchn); #define blkif_get(_b) (atomic_inc(&(_b)->refcnt)) #define blkif_put(_b) \ diff -r f9d9b9ff4bbc -r 87d43eab70d8 drivers/xen/blkback/interface.c --- a/drivers/xen/blkback/interface.c Wed Nov 26 17:00:33 2008 -0800 +++ b/drivers/xen/blkback/interface.c Thu Nov 27 17:31:51 2008 -0800 @@ -34,11 +34,14 @@ #include #include +#define INVALID_GRANT_HANDLE ((grant_handle_t)~0U) + static kmem_cache_t *blkif_cachep; blkif_t *blkif_alloc(domid_t domid) { blkif_t *blkif; + int i; blkif = kmem_cache_alloc(blkif_cachep, GFP_KERNEL); if (!blkif) @@ -52,53 +55,92 @@ blkif->st_print = jiffies; init_waitqueue_head(&blkif->waiting_to_free); + for (i = 0; i < BLKIF_MAX_NUM_RING_PAGES; i++) { + blkif->shmem_handle[i] = INVALID_GRANT_HANDLE; + } + return blkif; } -static int map_frontend_page(blkif_t *blkif, unsigned long shared_page) + +static void unmap_frontend_pages(blkif_t *blkif) { - struct gnttab_map_grant_ref op; + struct gnttab_unmap_grant_ref op[BLKIF_MAX_NUM_RING_PAGES]; + int i, op_count = 0; - gnttab_set_map_op(&op, (unsigned long)blkif->blk_ring_area->addr, - GNTMAP_host_map, shared_page, blkif->domid); + for (i = 0; i < blkif->num_ring_pages; i++) { + if (blkif->shmem_handle[i] != INVALID_GRANT_HANDLE) { - if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1)) - BUG(); + unsigned long addr = (unsigned long) + blkif->blk_ring_area->addr + i * PAGE_SIZE; - if (op.status) { - DPRINTK(" Grant table operation failure !\n"); - return op.status; + gnttab_set_unmap_op(&op[op_count], addr, + GNTMAP_host_map, + blkif->shmem_handle[i]); + + blkif->shmem_handle[i] = INVALID_GRANT_HANDLE; + + op_count++; + } } - blkif->shmem_ref = shared_page; - blkif->shmem_handle = op.handle; - - return 0; -} - -static void unmap_frontend_page(blkif_t *blkif) -{ - struct gnttab_unmap_grant_ref op; - - gnttab_set_unmap_op(&op, (unsigned long)blkif->blk_ring_area->addr, - GNTMAP_host_map, blkif->shmem_handle); - - if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1)) + if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, op, op_count)) BUG(); } -int blkif_map(blkif_t *blkif, unsigned long shared_page, unsigned int evtchn) + +static int map_frontend_pages(blkif_t *blkif, unsigned long *shared_pages) +{ + struct gnttab_map_grant_ref op[BLKIF_MAX_NUM_RING_PAGES]; + int i, ret_val = 0; + + for (i = 0; i < blkif->num_ring_pages; i++) { + unsigned long addr = (unsigned long) + blkif->blk_ring_area->addr + i * PAGE_SIZE; + + gnttab_set_map_op(&op[i], addr, GNTMAP_host_map, + shared_pages[i], blkif->domid); + } + + if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, + op, blkif->num_ring_pages)) + BUG(); + + for (i = 0; i < blkif->num_ring_pages; i++) { + + if (op[i].status) { + DPRINTK(" Grant table operation failure !\n"); + + /*record first error code*/ + if (!ret_val) + ret_val = op[i].status; + } else { + blkif->shmem_ref[i] = shared_pages[i]; + blkif->shmem_handle[i] = op[i].handle; + } + } + + /*if one of the mapping failed, unmap all grants*/ + if (ret_val) + unmap_frontend_pages(blkif); + + return ret_val; +} + + +int blkif_map(blkif_t *blkif, unsigned long *shared_pages, unsigned int evtchn) { int err; + unsigned long ring_area_size = blkif->num_ring_pages * PAGE_SIZE; /* Already connected through? */ if (blkif->irq) return 0; - if ( (blkif->blk_ring_area = alloc_vm_area(PAGE_SIZE)) == NULL ) + if ( (blkif->blk_ring_area = alloc_vm_area(ring_area_size)) == NULL ) return -ENOMEM; - err = map_frontend_page(blkif, shared_page); + err = map_frontend_pages(blkif, shared_pages); if (err) { free_vm_area(blkif->blk_ring_area); return err; @@ -109,21 +151,21 @@ { blkif_sring_t *sring; sring = (blkif_sring_t *)blkif->blk_ring_area->addr; - BACK_RING_INIT(&blkif->blk_rings.native, sring, PAGE_SIZE); + BACK_RING_INIT(&blkif->blk_rings.native, sring, ring_area_size); break; } case BLKIF_PROTOCOL_X86_32: { blkif_x86_32_sring_t *sring_x86_32; sring_x86_32 = (blkif_x86_32_sring_t *)blkif->blk_ring_area->addr; - BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32, PAGE_SIZE); + BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32, ring_area_size); break; } case BLKIF_PROTOCOL_X86_64: { blkif_x86_64_sring_t *sring_x86_64; sring_x86_64 = (blkif_x86_64_sring_t *)blkif->blk_ring_area->addr; - BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64, PAGE_SIZE); + BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64, ring_area_size); break; } default: @@ -134,7 +176,7 @@ blkif->domid, evtchn, blkif_be_int, 0, "blkif-backend", blkif); if (err < 0) { - unmap_frontend_page(blkif); + unmap_frontend_pages(blkif); free_vm_area(blkif->blk_ring_area); blkif->blk_rings.common.sring = NULL; return err; @@ -161,7 +203,7 @@ } if (blkif->blk_rings.common.sring) { - unmap_frontend_page(blkif); + unmap_frontend_pages(blkif); free_vm_area(blkif->blk_ring_area); blkif->blk_rings.common.sring = NULL; } @@ -176,6 +218,7 @@ void __init blkif_interface_init(void) { + IPRINTK("blkback supports up to %d pages ring\n", BLKIF_MAX_NUM_RING_PAGES); blkif_cachep = kmem_cache_create("blkif_cache", sizeof(blkif_t), 0, 0, NULL, NULL); } diff -r f9d9b9ff4bbc -r 87d43eab70d8 drivers/xen/blkback/xenbus.c --- a/drivers/xen/blkback/xenbus.c Wed Nov 26 17:00:33 2008 -0800 +++ b/drivers/xen/blkback/xenbus.c Thu Nov 27 17:31:51 2008 -0800 @@ -469,18 +469,69 @@ static int connect_ring(struct backend_info *be) { struct xenbus_device *dev = be->dev; - unsigned long ring_ref; + unsigned long ring_ref[BLKIF_MAX_NUM_RING_PAGES]; unsigned int evtchn; char protocol[64] = ""; - int err; + char path[64]; + int err, i; DPRINTK("%s", dev->otherend); - err = xenbus_gather(XBT_NIL, dev->otherend, "ring-ref", "%lu", &ring_ref, - "event-channel", "%u", &evtchn, NULL); + /*try one page ring handshake first*/ + err = xenbus_gather(XBT_NIL, dev->otherend, + "ring-ref", "%lu", &ring_ref[0], + NULL); + + /*if reading a single page failed, try multi-page ring handshake*/ + if (err) { + unsigned int num_ring_pages; + + /*read the number of pages*/ + err = xenbus_gather(XBT_NIL, dev->otherend, + "num-ring-pages", "%u", &num_ring_pages, + NULL); + if (err) { + xenbus_dev_fatal(dev, err, + "reading ring-ref or num-ring_pages"); + return err; + } + + /*sanity check the number of pages*/ + if (num_ring_pages != 2 && num_ring_pages != 4) { + xenbus_dev_fatal(dev, -EINVAL, + "invalid value for num-ring-pages"); + return -1; + } + + be->blkif->num_ring_pages = num_ring_pages; + + for (i = 0; i < num_ring_pages; i++) { + char buf[10]; + snprintf(buf, sizeof(buf), "ring-ref%d", i); + err = xenbus_gather(XBT_NIL, dev->otherend, buf, + "%lu", &ring_ref[i], + NULL); + if (err) { + xenbus_dev_fatal(dev, err, + "reading %s/%s", + dev->otherend, buf); + return err; + } + } + + printk(KERN_INFO "Setting up ring with %d pages\n", num_ring_pages); + } else { + printk(KERN_INFO "Setting up single page ring\n"); + be->blkif->num_ring_pages = 1; + } + + err = xenbus_gather(XBT_NIL, dev->otherend, + "event-channel", "%u", &evtchn, + NULL); + if (err) { xenbus_dev_fatal(dev, err, - "reading %s/ring-ref and event-channel", + "reading %s/event-channel", dev->otherend); return err; } @@ -500,15 +551,19 @@ xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol); return -1; } - printk(KERN_INFO - "blkback: ring-ref %ld, event-channel %d, protocol %d (%s)\n", - ring_ref, evtchn, be->blkif->blk_protocol, protocol); + + printk(KERN_INFO "blkback:"); + for (i = 0; i < be->blkif->num_ring_pages; i++) { + printk(" ring-ref%d %ld, ", i, ring_ref[i]); + } + printk("event-channel %d, protocol %d (%s)\n", + evtchn, be->blkif->blk_protocol, protocol); /* Map the shared frame, irq etc. */ err = blkif_map(be->blkif, ring_ref, evtchn); if (err) { - xenbus_dev_fatal(dev, err, "mapping ring-ref %lu port %u", - ring_ref, evtchn); + xenbus_dev_fatal(dev, err, + "mapping ring-refs and port %u", evtchn); return err; } diff -r f9d9b9ff4bbc -r 87d43eab70d8 drivers/xen/blkfront/blkfront.c --- a/drivers/xen/blkfront/blkfront.c Wed Nov 26 17:00:33 2008 -0800 +++ b/drivers/xen/blkfront/blkfront.c Thu Nov 27 17:31:51 2008 -0800 @@ -160,7 +160,13 @@ { const char *message = NULL; struct xenbus_transaction xbt; - int err; + int err, i; + + BUG_ON(BLK_NUM_RING_PAGES != 1 && + BLK_NUM_RING_PAGES != 2 && + BLK_NUM_RING_PAGES != 4); + + IPRINTK("blkfront: setting up %d pages ring\n", BLK_NUM_RING_PAGES); /* Create shared ring, alloc event channel. */ err = setup_blkring(dev, info); @@ -168,18 +174,47 @@ goto out; again: + DPRINTK("blkring xenbus transaction\n"); + err = xenbus_transaction_start(&xbt); if (err) { xenbus_dev_fatal(dev, err, "starting transaction"); goto destroy_blkring; } - err = xenbus_printf(xbt, dev->nodename, - "ring-ref","%u", info->ring_ref); - if (err) { - message = "writing ring-ref"; - goto abort_transaction; + if (BLK_NUM_RING_PAGES == 1) { + /*use backward compatible handshake for 1 page ring*/ + err = xenbus_printf(xbt, dev->nodename, + "ring-ref","%u", info->ring_ref[0]); + if (err) { + message = "writing ring-ref"; + goto abort_transaction; + } + + } else { + /*use new handshake for 2 and 4 pages ring*/ + err = xenbus_printf(xbt, dev->nodename, + "num-ring-pages", + "%u", BLK_NUM_RING_PAGES); + + if (err) { + message = "writing num-ring-pages"; + goto abort_transaction; + } + + for (i = 0; i < BLK_NUM_RING_PAGES; i++) { + char buf[10]; + snprintf(buf, sizeof(buf), "ring-ref%d", i); + err = xenbus_printf(xbt, dev->nodename, + buf, "%u", info->ring_ref[i]); + if (err) { + message = "writing ring-refs"; + goto abort_transaction; + } + + } } + err = xenbus_printf(xbt, dev->nodename, "event-channel", "%u", irq_to_evtchn_port(info->irq)); if (err) { @@ -220,25 +255,32 @@ struct blkfront_info *info) { blkif_sring_t *sring; - int err; + int i, order, err; - info->ring_ref = GRANT_INVALID_REF; + for (i = 0; i < BLK_NUM_RING_PAGES; i++) + info->ring_ref[i] = GRANT_INVALID_REF; - sring = (blkif_sring_t *)__get_free_page(GFP_NOIO | __GFP_HIGH); + order = get_order(BLK_RING_AREA_SIZE); + sring = (blkif_sring_t *)__get_free_pages(GFP_KERNEL, order); if (!sring) { xenbus_dev_fatal(dev, -ENOMEM, "allocating shared ring"); return -ENOMEM; } SHARED_RING_INIT(sring); - FRONT_RING_INIT(&info->ring, sring, PAGE_SIZE); + FRONT_RING_INIT(&info->ring, sring, BLK_RING_AREA_SIZE); - err = xenbus_grant_ring(dev, virt_to_mfn(info->ring.sring)); - if (err < 0) { - free_page((unsigned long)sring); - info->ring.sring = NULL; - goto fail; + for (i = 0; i < BLK_NUM_RING_PAGES; i++) { + unsigned long addr = + (unsigned long)info->ring.sring + i * PAGE_SIZE; + + err = xenbus_grant_ring(dev, virt_to_mfn(addr)); + if (err < 0) { + free_pages((unsigned long)sring, order); + info->ring.sring = NULL; + goto fail; + } + info->ring_ref[i] = err; } - info->ring_ref = err; err = bind_listening_port_to_irqhandler( dev->otherend_id, blkif_int, SA_SAMPLE_RANDOM, "blkif", info); @@ -790,6 +832,8 @@ static void blkif_free(struct blkfront_info *info, int suspend) { + int i; + /* Prevent new requests being issued until we fix things up. */ spin_lock_irq(&blkif_io_lock); info->connected = suspend ? @@ -805,12 +849,19 @@ flush_scheduled_work(); /* Free resources associated with old device channel. */ - if (info->ring_ref != GRANT_INVALID_REF) { - gnttab_end_foreign_access(info->ring_ref, - (unsigned long)info->ring.sring); - info->ring_ref = GRANT_INVALID_REF; - info->ring.sring = NULL; + for (i = 0; i < BLK_NUM_RING_PAGES; i++) { + + if (info->ring_ref[i] != GRANT_INVALID_REF) { + gnttab_end_foreign_access(info->ring_ref[i], 0UL); + info->ring_ref[i] = GRANT_INVALID_REF; + info->ring.sring = NULL; + } } + + if (info->ring.sring) + free_pages((unsigned long)info->ring.sring, + get_order(BLK_RING_AREA_SIZE)); + if (info->irq) unbind_from_irqhandler(info->irq, info); info->irq = 0; diff -r f9d9b9ff4bbc -r 87d43eab70d8 drivers/xen/blkfront/block.h --- a/drivers/xen/blkfront/block.h Wed Nov 26 17:00:33 2008 -0800 +++ b/drivers/xen/blkfront/block.h Thu Nov 27 17:31:51 2008 -0800 @@ -57,6 +57,7 @@ #include #define DPRINTK(_f, _a...) pr_debug(_f, ## _a) +#define IPRINTK(_f, _a...) printk(KERN_INFO _f, ## _a) #if 0 #define DPRINTK_IOCTL(_f, _a...) printk(KERN_ALERT _f, ## _a) @@ -86,7 +87,15 @@ unsigned long frame[BLKIF_MAX_SEGMENTS_PER_REQUEST]; }; -#define BLK_RING_SIZE __RING_SIZE((blkif_sring_t *)0, PAGE_SIZE) +#ifndef CONFIG_XEN_BLKDEV_NUM_RING_PAGES +#error "CONFIG_XEN_BLKDEV_NUM_RING_PAGES undefined!" +#endif +#if CONFIG_XEN_BLKDEV_NUM_RING_PAGES > BLKIF_MAX_NUM_RING_PAGES +#error "CONFIG_XEN_BLKDEV_NUM_RING_PAGES too large" +#endif +#define BLK_NUM_RING_PAGES CONFIG_XEN_BLKDEV_NUM_RING_PAGES +#define BLK_RING_AREA_SIZE (BLK_NUM_RING_PAGES * PAGE_SIZE) +#define BLK_RING_SIZE __RING_SIZE((blkif_sring_t *)0, BLK_RING_AREA_SIZE) /* * We have one of these per vbd, whether ide, scsi or 'other'. They @@ -101,7 +110,7 @@ int vdevice; blkif_vdev_t handle; int connected; - int ring_ref; + int ring_ref[BLK_NUM_RING_PAGES]; blkif_front_ring_t ring; unsigned int irq; struct xlbd_major_info *mi; diff -r f9d9b9ff4bbc -r 87d43eab70d8 include/xen/interface/io/blkif.h --- a/include/xen/interface/io/blkif.h Wed Nov 26 17:00:33 2008 -0800 +++ b/include/xen/interface/io/blkif.h Thu Nov 27 17:31:51 2008 -0800 @@ -124,6 +124,11 @@ DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response); +/* + * Maximum number of pages used for a blkif ring. + */ +#define BLKIF_MAX_NUM_RING_PAGES 4 + #define VDISK_CDROM 0x1 #define VDISK_REMOVABLE 0x2 #define VDISK_READONLY 0x4