[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Tagging Xen 4.0.0 first release candidate / pygrub dom0 caching bug



On Thu, Jan 21, 2010 at 01:53:21PM -0800, Daniel Stodden wrote:
> On Thu, 2010-01-21 at 16:01 -0500, Pasi Kärkkäinen wrote:
> > On Thu, Jan 21, 2010 at 07:37:27PM +0000, Ian Campbell wrote:
> > > On Thu, 2010-01-21 at 19:16 +0000, Daniel Stodden wrote: 
> > > > On Thu, 2010-01-21 at 13:44 -0500, Daniel Stodden wrote:
> > > > > On Thu, 2010-01-21 at 07:28 -0500, Pasi Kärkkäinen wrote:
> > > > > > On Wed, Jan 06, 2010 at 03:50:05PM +0200, Pasi Kärkkäinen wrote:
> > > > > > > On Tue, Jan 05, 2010 at 06:42:05AM +0000, Keir Fraser wrote:
> > > > > > > > I plan to tag -rc1 later this week. If you have any outstanding 
> > > > > > > > patches,
> > > > > > > > please send them to the list now.
> > > > > > > > 
> > > > > > > 
> > > > > > > Hmm.. I just remembered this pygrub bug:
> > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=466681
> > > > > > > 
> > > > > > > pygrub doesn't use O_DIRECT so sometimes it gets old information
> > > > > > > from dom0 kernel cache - and fails to use the updated domU 
> > > > > > > grub.conf.
> > > > > > > 
> > > > > > > Redhat seems to have patches available for testing.. not for
> > > > > > > xen-unstable though.
> > > > > > > 
> > > > > > > I've personally hit this bug many times.
> > > > > > > 
> > > > > > 
> > > > > > It seems Redhat guys have a fix available.. they fixed the problem 
> > > > > > by 
> > > > > > patching dom0 kernel blkback.
> > > > > > 
> > > > > > More details about the fix here:
> > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=466681
> > > > > > 
> > > > > > Should this be applied to 2.6.18-xen and pv_ops dom0 kernels 
> > > > > > aswell? 
> > > > > 
> > > > > Only to 2.6.18. 
> > > > > 
> > > > > It's obsolete after 2.6.27.
> > > > > O_DIRECT gained page cache invalidation in the meantime.
> > > > 
> > > > Aiiee, sorry. I guess this one only applies to tapdisks. The page cache
> > > > invalidation only covers the filemap. That obviously won't fix blkback
> > > > bios on raw devices.
> > > > 
> > > > Ian Campbell recently noted he came across a different fix, which adds
> > > > direct-io to e2fsprogs.
> > > 
> > > I noted the thread because the root problem seemed interesting and
> > > worthy of investigation, but I should have made it clear that I didn't
> > > think messing with direct-io in e2fsprogs was the correct solution. I
> > > think the majority of the participants in the thread thought that too.
> > > The biggest problem is that it only solves the issue in the one specific
> > > case of things which use e2fsprogs and not in general, we can't go round
> > > adding O_DIRECT to everything which might be used to access these disks.
> > > 
> > 
> > Yeah, it should be fixed in blkback.. who knows, some users might be using 
> > other tools in dom0 aswell, not just pygrub.
> 
> Fully agreed.
> 
> But: One thing about the rhel patch isn't immediately clear to me. The
> invalidate step apparently goes into the VBD creation.
> 

With the RH kernel blkback kernel patch/fix:

1) xm create domU
2) pygrub runs, caching stuff in dom0 kernel cache
3) domU is started, the patched blkback driver flushes dom0 kernel cache when 
the disk backend is created
4) grub.conf is modified in the guest
5) domU shuts down

6) xm create domU
7) pygrub runs, and gets the new updated grub.conf, since there's nothing in 
the dom0 kernel cache, since it was flushed in 3)
8) domU is started, blkback again flushes the dom0 kernel cache to prevent 
future problems

That'w how I understood it..

> I don't see why this is sufficient, my understanding was that pygrub
> would rather read stale data after boot, then run, then shutdown, then
> reboot.
> 
> Which rather suggests flushing during shutdown (?).
> 

disk IO from the domU blkfront is not cached in dom0, 
so it's enough to flush during the disk backend creation? 

pygrub is the only player here who gets stuff in the dom0 cache.

> Or rather on both ends. Because 1) installing a guest by copying a VDI
> image 2) failing to properly close the raw device to get the caches
> flushed before 3) booting the VM is another potential problem.
> 
> We used to see the latter becoming an issue in the past.
> 

I guess it wouldn't hurt to also flush during shutdown.. ?

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.