[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot



> -----Original Message-----
> From: Paul Durrant
> Sent: 12 June 2017 15:43
> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>; 'Jan Beulich'
> <JBeulich@xxxxxxxx>
> Cc: Juergen Gross <jgross@xxxxxxxx>; Andrew Cooper
> <Andrew.Cooper3@xxxxxxxxxx>; Julien Grall (julien.grall@xxxxxxx)
> <julien.grall@xxxxxxx>; 'Boris Ostrovsky' <boris.ostrovsky@xxxxxxxxxx>;
> xen-devel(xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen-
> devel@xxxxxxxxxxxxxxxxxxxx>
> Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> 
> > -----Original Message-----
> > From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of
> > Paul Durrant
> > Sent: 12 June 2017 15:29
> > To: 'Jan Beulich' <JBeulich@xxxxxxxx>
> > Cc: Juergen Gross <jgross@xxxxxxxx>; Andrew Cooper
> > <Andrew.Cooper3@xxxxxxxxxx>; Julien Grall (julien.grall@xxxxxxx)
> > <julien.grall@xxxxxxx>; 'Boris Ostrovsky' <boris.ostrovsky@xxxxxxxxxx>;
> > xen-devel(xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen-
> > devel@xxxxxxxxxxxxxxxxxxxx>
> > Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> >
> > > -----Original Message-----
> > > From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > > Sent: 12 June 2017 14:55
> > > To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> > > Cc: Julien Grall (julien.grall@xxxxxxx) <julien.grall@xxxxxxx>; Andrew
> > > Cooper <Andrew.Cooper3@xxxxxxxxxx>; xen-devel(xen-
> > > devel@xxxxxxxxxxxxxxxxxxxx) <xen-devel@xxxxxxxxxxxxxxxxxxxx>; 'Boris
> > > Ostrovsky' <boris.ostrovsky@xxxxxxxxxx>; Juergen Gross
> > > <jgross@xxxxxxxx>
> > > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> > >
> > > >>> On 12.06.17 at 14:05, <Paul.Durrant@xxxxxxxxxx> wrote:
> > > >>  -----Original Message-----
> > > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > > >> Sent: 12 June 2017 12:12
> > > >> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> > > >> Cc: Julien Grall (julien.grall@xxxxxxx) <julien.grall@xxxxxxx>;
> Andrew
> > > >> Cooper <Andrew.Cooper3@xxxxxxxxxx>; xen-devel(xen-
> > > >> devel@xxxxxxxxxxxxxxxxxxxx) <xen-devel@xxxxxxxxxxxxxxxxxxxx>; 'Boris
> > > >> Ostrovsky' <boris.ostrovsky@xxxxxxxxxx>; Juergen Gross
> > > >> <jgross@xxxxxxxx>
> > > >> Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> > > >>
> > > >> >>> On 12.06.17 at 12:53, <Paul.Durrant@xxxxxxxxxx> wrote:
> > > >> >>  -----Original Message-----
> > > >> > [snip]
> > > >> >> > >
> > > >> >> > > What do you think it best to do for Xen 4.9? Hardcoding a 4k
> > > alignment
> > > >> is
> > > >> >> > > clearly easy and would work around this BIOS issue but, as you
> > say,
> > > it
> > > >> >> does
> > > >> >> > > grow the image. Reverting Juergen's patch also works round
> the
> > > issue,
> > > >> >> but
> > > >> >> > > that is more by luck. Re-working the code is preferable, but I
> > guess
> > > it's
> > > >> >> too
> > > >> >> > > late to introduce such code-churn in 4.9.
> > > >> >> >
> > > >> >> > Reverting Jürgen's code is out of question with all the 
> > > >> >> > information
> > > >> >> > you've gathered by now. I think re-working the EDD code slightly
> > > >> >> > is the best option. Would you mind giving the attached patch a
> > > >> >> > try? This still slightly grows the trampoline due to a few more
> > > >> >> > instructions being needed, but should still be far better than
> > > >> >> > embedding a whole 4k buffer (and then later finding a BIOS/disk
> > > >> >> > combination which wants even more). Note that I've left a tiny
> > > >> >> > bit of debugging code in there.
> > > >> >> >
> > > >> >>
> > > >> >> Sure, I'll give that a go now.
> > > >> >>
> > > >> >
> > > >> > That worked fine:
> > > >> >
> > > >> > (XEN) MBR[80] @ 85e0 (86000)
> > > >>
> > > >> But that's contrary to your earlier findings: Didn't you say simply
> > > >> avoiding a 4k-boundary wasn't enough? And it certainly tells us
> > > >> that this isn't a 4k drive (or at least the BIOS doesn't surface 4k
> > > >> sectors) - I was really expecting a larger gap between the two
> > > >> logged values.
> > > >>
> > > >
> > > > I'll go dump out the edd and double check what it is saying.
> > > >
> > > > My findings indicated that the problem seemed to be doing a read that
> > > > spanned a 4k boundary caused a problem, so using 0x85e00 would be
> > safe.
> > > The
> > > > anomaly was that simply aligning the edd_info buffer and a 512 byte
> > > boundary
> > > > and continuing to use that for reading did not work.
> > >
> > > But a 512-byte aligned 512-byte buffer can't possibly cross a page
> > > boundary.
> >
> > Indeed, which is why I was perplexed. I found that 0x60e00 was ok. Your
> > patch chose 0x85e00, which was ok too, but for some reason a '.align 512' in
> > front of boot_edd_info yielded an address which was not ok. I just checked
> > what address that yielded though (by booting with edd=off to avoid the
> > hang) and it was 0x86f40... which clearly means that '.align 512' is not 
> > doing
> > what I thought it would do.
> 
> No, the problem turns out to be the GLOBAL() macro which, in assembly
> files, contains an implicit .align 16!
> 

No, I misread.. ENTRY() contains the implicit align.

It's clearly even more subtle. Running objdump tells me the symbol is indeed 
512 byte aligned, but when it ends up on memory it's clearly not. So I guess it 
must be down to how the trampoline is loaded. Thus, not using a buffer within 
the trampoline image is most definitely the best idea.

  Paul

>   Paul
> 
> >
> >   Paul
> >
> > >
> > > >> > so you can add my Tested-by to that.
> > > >>
> > > >> I.e. I'm not sure about this, as I'm still uncertain whether some
> > > >> corruption didn't again occur. Of course APs coming up properly
> > > >> would already be a relatively good sign (as now the permanent
> > > >> part of the trampoline would be the predestined area for
> > > >> corruption to occur in).
> > > >>
> > > >
> > > > None of my findings ever indicated memory corruption (although
> there,
> > of
> > > > course, may have been some that I happened to miss), but rather
> > > misbehaviour
> > > > of the int13 handler itself - either locking up, having odd effects 
> > > > (e.g.
> > > > black screen), or both.
> > >
> > > Ah, I didn't understand it this way so far, and instead had implied
> > > that the handler did return, but corrupt our trampoline area in
> > > one way or another.
> > >
> > > Jan
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxx
> > https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.