[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue
On Tue, 2011-05-03 at 13:16 -0400, Vincent, Pradeep wrote: > Yes - I am positive what I am seeing isn't 'I/O scheduler issue due to > REQ_SYNC'. Trace data from blkback showed that blkback was simply not > submitting the 2nd I/O to the I/O scheduler. Type of I/O (read vs write) > doesn't matter. Block trace? Just to make sure we're debugging the right thing -- what I/O scheduler are you running? Daniel > Recreation Steps: > > 1. Generate I/O requests so that two I/Os are pending at any given time. > The I/O submissions shouldn't be synchronized. Potentially use two threads > for I/O submissions each submitting a small size random direct I/O. > 2. Verify that the guest sends out two I/Os at a given time. 'iostat' > avgqu-sz will be '2' > 3. Now check iostat in Dom-0 for the corresponding block device. Avgqu-sz > will be '1' > 4. 'await' comparison in DomU vs Dom0 will show a fairly big difference. > > And I confirmed that the patch I submitted fixes this issue. > > - Pradeep Vincent > > > On 5/3/11 7:55 AM, "Konrad Rzeszutek Wilk" <konrad.wilk@xxxxxxxxxx> wrote: > > >On Mon, May 02, 2011 at 06:10:22PM -0700, Vincent, Pradeep wrote: > >> Thanks Jan. > >> > >> Re: avoid unnecessary notification > >> > >> If this was a deliberate design choice then the duration of the delay is > >> at the mercy of the pending I/O latencies & I/O patterns and the delay > >>is > >> simply too long in some cases. E.g. A write I/O stuck behind a read I/O > >> could see more than double the latency on a Xen guest compared to a > >> baremetal host. Avoiding notifications this way results in significant > >> latency degradation perceived by many applications. > > > >You sure this is not the fault of the IO scheduler? I had similar issues > >with the CFQ scheduler upstream and found out that I needed to add > >REQ_SYNC on write requests. > >> > >> If this is about allowing I/O scheduler to coalesce more I/Os, then I > >>bet > >> I/O scheduler's 'wait and coalesce' logic is a great substitute for the > >> delays introduced by blkback. > >> > >> I totally agree IRQ coalescing or delay is useful for both blkback and > >> netback but we need a logic that doesn't impact I/O latencies > >> significantly. Also, I don't think netback has this type of notification > >> avoidance logic (at least in 2.6.18 code base). > >> > >> > >> Re: Other points > >> > >> Good call. Changed the patch to include tabs. > >> > >> I wasn't very sure about blk_ring_lock usage and I should have clarified > >> it before sending out the patch. > >> > >> Assuming blk_ring_lock was meant to protect shared ring manipulations > >> within blkback, is there a reason 'blk_rings->common.req_cons' > >> manipulation in do_block_io_op is not protected ? The reasons for the > >> differences between locking logic in do_block_io_op and make_response > >> weren't terribly obvious although the failure mode for the race > >>condition > >> may very well be benign. > >> > >> Anyway, I am attaching a patch with appropriate changes. > >> > >> Jeremey, Can you apply this patch to pvops Dom-0 > >> (http://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git). Should > >>I > >> submit another patch for 2.6.18 Dom-0 ? > >> > >> > >> Signed-off-by: Pradeep Vincent <pradeepv@xxxxxxxxxx> > >> > >> diff --git a/drivers/xen/blkback/blkback.c > >>b/drivers/xen/blkback/blkback.c > >> --- a/drivers/xen/blkback/blkback.c > >> +++ b/drivers/xen/blkback/blkback.c > >> @@ -315,6 +315,7 @@ static int do_block_io_op(blkif_t *blkif) > >> pending_req_t *pending_req; > >> RING_IDX rc, rp; > >> int more_to_do = 0; > >> + unsigned long flags; > >> > >> rc = blk_rings->common.req_cons; > >> rp = blk_rings->common.sring->req_prod; > >> @@ -383,6 +384,15 @@ static int do_block_io_op(blkif_t *blkif) > >> cond_resched(); > >> } > >> > >> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we better > >> + let blkfront know about it (by setting req_event appropriately) so > >> that > >> + blkfront will bother to wake us up (via interrupt) when it submits > >>a > >> + new I/O */ > >> + if (!more_to_do){ > >> + spin_lock_irqsave(&blkif->blk_ring_lock, flags); > >> + RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do); > >> + spin_unlock_irqrestore(&blkif->blk_ring_lock, flags); > >> + } > >> return more_to_do; > >> } > >> > >> > >> > >> > >> > >> > >> > >> > >> On 5/2/11 1:13 AM, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote: > >> > >> >>>> On 02.05.11 at 09:04, "Vincent, Pradeep" <pradeepv@xxxxxxxxxx> > >>wrote: > >> >> In blkback driver, after I/O requests are submitted to Dom-0 block > >>I/O > >> >> subsystem, blkback goes to 'sleep' effectively without letting > >>blkfront > >> >>know > >> >> about it (req_event isn't set appropriately). Hence blkfront doesn't > >> >>notify > >> >> blkback when it submits a new I/O thus delaying the 'dispatch' of the > >> >>new I/O > >> >> to Dom-0 block I/O subsystem. The new I/O is dispatched as soon as > >>one > >> >>of the > >> >> previous I/Os completes. > >> >> > >> >> As a result of this issue, the block I/O latency performance is > >> >>degraded for > >> >> some workloads on Xen guests using blkfront-blkback stack. > >> >> > >> >> The following change addresses this issue: > >> >> > >> >> > >> >> Signed-off-by: Pradeep Vincent <pradeepv@xxxxxxxxxx> > >> >> > >> >> diff --git a/drivers/xen/blkback/blkback.c > >> >>b/drivers/xen/blkback/blkback.c > >> >> --- a/drivers/xen/blkback/blkback.c > >> >> +++ b/drivers/xen/blkback/blkback.c > >> >> @@ -383,6 +383,12 @@ static int do_block_io_op(blkif_t *blkif) > >> >> cond_resched(); > >> >> } > >> >> > >> >> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we > >>better > >> >> + let blkfront know about it (by setting req_event appropriately) > >>so > >> >>that > >> >> + blkfront will bother to wake us up (via interrupt) when it > >>submits a > >> >> + new I/O */ > >> >> + if (!more_to_do) > >> >> + RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, > >> >>more_to_do); > >> > > >> >To me this contradicts the comment preceding the use of > >> >RING_FINAL_CHECK_FOR_REQUESTS() in make_response() > >> >(there it's supposedly used to avoid unnecessary notification, > >> >here you say it's used to force notification). Albeit I agree that > >> >the change looks consistent with the comments in io/ring.h. > >> > > >> >Even if correct, you're not holding blkif->blk_ring_lock here, and > >> >hence I think you'll need to explain how this is not a problem. > >> > > >> >From a formal perspective, you also want to correct usage of tabs, > >> >and (assuming this is intended for the 2.6.18 tree) you'd also need > >> >to indicate so for Keir to pick this up and apply it to that tree (and > >> >it might then also be a good idea to submit an equivalent patch for > >> >the pv-ops trees). > >> > > >> >Jan > >> > > >> >> return more_to_do; > >> >> } > >> > > >> > > >> > > >> > > > > > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@xxxxxxxxxxxxxxxxxxx > >> http://lists.xensource.com/xen-devel > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |