[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] libxl: Increase device model startup timeout to 1min.



On Tue, 30 Jun 2015, Ian Jackson wrote:
> > >   * The number and nature of parallel operations done in the stress
> > >     test is unreasonable for the provided hardware:
> > >       => the timeout is fine
> > 
> > I don't know if it is our place to make this call.  Should we really be
> > deciding what is considered "reasonable"? I think not. Defining what is
> > reasonable and policies that match it is not a route I think we should
> > take in libxl.
> 
> Nevertheless if we are defining timeouts we are implicitly setting
> some parameters which imply that certain configurations are
> unreasonable.  Hopefully all such configurations are absurd.
> 
> If what you mean is that our bounds of `reasonable' should be very
> wide, then I agree.  If anyone could reasonably expect it to work,
> then that is fine.  Certainly we should refrain fromk subjective
> judgements.

OK.  How do you measure reasonable for this case?

What I actually mean to ask is how do you suggest we proceed on this
problem?

Of course it would be nice if we knew exactly why this is happening, but
the issue only happens once every 2-3 tempest runs, each of them takes
about 1 hour.  Tempest executes about 1300 tests for each run, some
of them in parallel. We haven't taken the time to read all the tests run
by tempest so we don't know exactly what they do.

We don't really know the environment that causes the failure. Reading
all the tests is not an option. We could try adding more tracing to the
system, but given the type of error, if we do we are not likely to
reproduce the error at all, or maybe reproduce something different.


Given the state of things, I suggest we make sure that increasing the
timeout actually fixes/works-around the problem. I would also like to
see some empirical measurements that tell us by how much we should
increase the timeout. Is 1 minute actually enough?

I would not go as far as asking to figure out what the real cause of the
problem is, because there is no way to estimate how long is going to
take or even how to do that. And in the meantime we still have spurious
failures in the OpenStack CI-loop.

Alternatively we could carry the work around in the Xen package we build
for the OpenStack CI-loop, leaving xen-unstable "unfixed", but I think
that would be less desirable.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.