[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] blktap wedges when block-attached to dom0
Any chance this will be refreshed for 2.6.18? I very much enjoy being able to block-attach in domain 0, but am less enamoured of the frequent hangs when I fsck those devices... On Tuesday, 02 January 2007 at 17:37, jake wrote: > blktap devices attached to dom0 are liable to wedge during IO transfers. > The problem does not occur in typical usage scenarios (i.e., virtual > devices attached to guest domains); it is unique to the unanticipated > case in which virtual devices are attached to dom0. > > The problem arises when processes in dom0 generate a large number of > dirty pages while writing to a block-attached device. Once the number > of dirty pages reaches a certain threshold, the dom0 kernel begins > throttling IO in balance_dirty_pages; processes traversing the buffered > IO path will block in this function until the number of dirty pages > decreases. > > This is bad for the tapdisk process, which is responsible for servicing > IO requests from the blktap driver. The tapdisk process normally > performs direct IO, but if it writes to a hole in a sparse file, it > falls into the buffered IO path. If the tapdisk process blocks in > balance_dirty_pages, it will do so indefinitely, because it is the only > process that cleans the pages dirtied by the processes writing to the > virtual device. Thus dirty pages continue to amass in dom0 as IO is > performed on the virtual device, but none of them make it to the > physical devices because the tapdisk process is unable to service the > requests. > > Note that when used as originally intended, blktap does not suffer from > this problem: when blktap devices are attached to guest domains, > performing IO on them dirties pages in the guest domain, not in dom0, so > the tapdisk process doesn't get throttled in balance_dirty_pages. > > Attached is a patch that eschews the dom0 problem by exempting the > tapdisk process from blocking in balance_dirty_pages. tapdisk processes > servicing dom0-attached devices are granted special status using a > modified setpriority syscall; a check in balance_dirty_pages ensures > that such processes do not block indefinitely. > > This is clearly a hacky solution; any suggestions for improvement are > welcome. > # HG changeset patch > # User Jake Wires <jwires@xxxxxxxxxxxxx> > # Date 1166551978 28800 > # Node ID 34c6a9a2983ae46fad5dbba7e4b49520fb639a8c > # Parent df1e7ae878b4badf4e5555df12a1c4d233170fb9 > [BLKTAP] prevent tapdisk processes from blocking in balance_dirty_pages > > This patch mods the setpriority syscall to enable marking processes as special > IO processes. IO processes are exempted from blocking in balance_dirty_pages. > This patch is intended to avoid deadlocks when block-attaching a blktap VDI to > dom0. > > diff -r df1e7ae878b4 -r 34c6a9a2983a patches/linux-2.6.16.33/series > +++ b/patches/linux-2.6.16.33/series Tue Dec 19 10:12:58 2006 -0800 > @@ -5,6 +5,7 @@ git-4bfaaef01a1badb9e8ffb0c0a37cd2379008 > git-4bfaaef01a1badb9e8ffb0c0a37cd2379008d21f.patch > linux-2.6.19-rc1-kexec-move_segment_code-x86_64.patch > blktap-aio-16_03_06.patch > +blktap-ioprio.patch > device_bind.patch > fix-hz-suspend.patch > fix-ide-cd-pio-mode.patch > diff -r df1e7ae878b4 -r 34c6a9a2983a tools/blktap/drivers/blktapctrl.c > +++ b/tools/blktap/drivers/blktapctrl.c Tue Dec 19 10:12:58 2006 -0800 > @@ -51,6 +51,7 @@ > #include <xs.h> > #include <printf.h> > #include <sys/time.h> > +#include <sys/resource.h> > #include <syslog.h> > > #include "blktaplib.h" > @@ -535,6 +536,14 @@ int blktapctrl_new_blkif(blkif_t *blkif) > goto fail; > } > > + /* exempt tapdisk from flushing when attached to dom0 */ > + if (blkif->domid == 0) > + if (setpriority(PRIO_PROCESS, > + blkif->tappid, PRIO_SPECIAL_IO)) { > + DPRINTF("Unable to prioritize tapdisk proc\n"); > + goto fail; > + } > + > /* Both of the following read and write calls will block up to > * max_timeout val*/ > if (write_msg(blkif->fds[WRITE], CTLMSG_PARAMS, blkif, ptr) > diff -r df1e7ae878b4 -r 34c6a9a2983a tools/blktap/lib/blktaplib.h > +++ b/tools/blktap/lib/blktaplib.h Tue Dec 19 10:12:58 2006 -0800 > @@ -57,6 +57,8 @@ > #define BLKTAP_QUERY_ALLOC_REQS 8 > #define BLKTAP_IOCTL_FREEINTF 9 > #define BLKTAP_IOCTL_PRINT_IDXS 100 > + > +#define PRIO_SPECIAL_IO -9999 > > /* blktap switching modes: (Set with BLKTAP_IOCTL_SETMODE) */ > #define BLKTAP_MODE_PASSTHROUGH 0x00000000 /* default */ > diff -r df1e7ae878b4 -r 34c6a9a2983a > patches/linux-2.6.16.33/blktap-ioprio.patch > +++ b/patches/linux-2.6.16.33/blktap-ioprio.patch Tue Dec 19 10:12:58 > 2006 -0800 > @@ -0,0 +1,81 @@ > +diff -pruN ../orig-linux-2.6.16.33/include/linux/sched.h > ./include/linux/sched.h > +--- ../orig-linux-2.6.16.33/include/linux/sched.h 2006-12-18 > 18:42:00.000000000 -0800 > ++++ ./include/linux/sched.h 2006-12-18 18:46:07.000000000 -0800 > +@@ -706,6 +706,7 @@ struct task_struct { > + prio_array_t *array; > + > + unsigned short ioprio; > ++ short special_prio; > + > + unsigned long sleep_avg; > + unsigned long long timestamp, last_ran; > +diff -pruN ../orig-linux-2.6.16.33/include/linux/resource.h > ./include/linux/resource.h > +--- ../orig-linux-2.6.16.33/include/linux/resource.h 2006-12-18 > 18:42:00.000000000 -0800 > ++++ ./include/linux/resource.h 2006-12-18 18:44:35.000000000 -0800 > +@@ -44,6 +44,7 @@ struct rlimit { > + > + #define PRIO_MIN (-20) > + #define PRIO_MAX 20 > ++#define PRIO_SPECIAL_IO -9999 > + > + #define PRIO_PROCESS 0 > + #define PRIO_PGRP 1 > +diff -pruN ../orig-linux-2.6.16.33/include/linux/init_task.h > ./include/linux/init_task.h > +--- ../orig-linux-2.6.16.33/include/linux/init_task.h 2006-12-18 > 18:42:00.000000000 -0800 > ++++ ./include/linux/init_task.h 2006-12-18 18:45:56.000000000 -0800 > +@@ -85,6 +85,7 @@ extern struct group_info init_groups; > + .lock_depth = -1, \ > + .prio = MAX_PRIO-20, \ > + .static_prio = MAX_PRIO-20, \ > ++ .special_prio = 0, \ > + .policy = SCHED_NORMAL, \ > + .cpus_allowed = CPU_MASK_ALL, \ > + .mm = NULL, \ > +diff -pruN ../orig-linux-2.6.16.33/kernel/sys.c ./kernel/sys.c > +--- ../orig-linux-2.6.16.33/kernel/sys.c 2006-12-18 18:42:00.000000000 > -0800 > ++++ ./kernel/sys.c 2006-12-18 18:43:30.000000000 -0800 > +@@ -245,6 +245,11 @@ static int set_one_prio(struct task_stru > + error = -EPERM; > + goto out; > + } > ++ if (niceval == PRIO_SPECIAL_IO) { > ++ p->special_prio = PRIO_SPECIAL_IO; > ++ error = 0; > ++ goto out; > ++ } > + if (niceval < task_nice(p) && !can_nice(p, niceval)) { > + error = -EACCES; > + goto out; > +@@ -272,10 +277,15 @@ asmlinkage long sys_setpriority(int whic > + > + /* normalize: avoid signed division (rounding problems) */ > + error = -ESRCH; > +- if (niceval < -20) > +- niceval = -20; > +- if (niceval > 19) > +- niceval = 19; > ++ if (niceval == PRIO_SPECIAL_IO) { > ++ if (which != PRIO_PROCESS) > ++ return -EINVAL; > ++ } else { > ++ if (niceval < -20) > ++ niceval = -20; > ++ if (niceval > 19) > ++ niceval = 19; > ++ } > + > + read_lock(&tasklist_lock); > + switch (which) { > +diff -pruN ../orig-linux-2.6.16.33/mm/page-writeback.c ./mm/page-writeback.c > +--- ../orig-linux-2.6.16.33/mm/page-writeback.c 2006-12-19 > 10:03:59.000000000 -0800 > ++++ ./mm/page-writeback.c 2006-12-19 10:04:17.000000000 -0800 > +@@ -231,6 +231,9 @@ static void balance_dirty_pages(struct a > + pages_written += write_chunk - wbc.nr_to_write; > + if (pages_written >= write_chunk) > + break; /* We've done our duty */ > ++ if (current->special_prio == PRIO_SPECIAL_IO) > ++ break; /* Exempt IO processes */ > ++ > + } > + blk_congestion_wait(WRITE, HZ/10); > + } > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |