[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7 01/14] x86: add socket_to_cpumask



On Tue, May 19, 2015 at 11:18:20AM +0100, Jan Beulich wrote:
> >>> On 19.05.15 at 11:51, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
> > On Tue, May 19, 2015 at 08:31:53AM +0100, Jan Beulich wrote:
> >> >>> On 19.05.15 at 09:10, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
> >> > On Tue, May 19, 2015 at 07:52:04AM +0100, Jan Beulich wrote:
> >> >> >>> On 19.05.15 at 08:47, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
> >> >> > On Tue, May 19, 2015 at 07:28:49AM +0100, Jan Beulich wrote:
> >> >> >> >>> On 19.05.15 at 08:12, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
> >> >> >> > On Mon, May 18, 2015 at 02:21:40PM +0100, Jan Beulich wrote:
> >> >> >> >> >>> On 08.05.15 at 10:56, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
> >> >> >> >> > @@ -112,6 +115,8 @@ static int __devinit 
> >> >> >> >> > MP_processor_info_x(struct 
> >> >> > mpc_config_processor *m,
> >> >> >> >> >  {
> >> >> >> >> >         int ver, apicid, cpu = 0;
> >> >> >> >> >         
> >> >> >> >> > +       total_cpus++;
> >> >> >> >> > +
> >> >> >> >> >         if (!(m->mpc_cpuflag & CPU_ENABLED)) {
> >> >> >> >> >                 if (!hotplug)
> >> >> >> >> >                         ++disabled_cpus;
> >> >> >> >> 
> >> >> >> >> Is there a reason you can't use disabled_cpus and avoid adding yet
> >> >> >> >> another variable?
> >> >> >> > 
> >> >> >> > The problem is not with disabled_cpus but with num_processors, 
> >> >> >> > which
> >> >> >> > does not keep the original detected cpus in current code.
> >> >> >> > Hence 'total_cpus = disabled_cpus + num_processors' may not be 
> >> >> >> > correct
> >> >> >> > in some cases.
> >> >> >> 
> >> >> >> Please be more specific about when this is a problem (I do note that
> >> >> >> I'm aware that the equation will not always hold, but during my
> >> >> >> inspection while reviewing your change I didn't see that this would
> >> >> >> ever become problematic).
> >> >> > 
> >> >> > What I really need is the original cpu count enumerated from MADT. If
> >> >> > not introduce total_cpus then the only way getting it AFAICS is
> >> >> > 'disabled_cpus + num_processors'.
> >> >> > 
> >> >> > The problem is that MP_processor_info_x() have some earlier returns
> >> >> > before increasing num_processors. In those cases, the cpu detected 
> >> >> > will
> >> >> > neither counted to disabled_cpus nor num_processors, which means
> >> >> > 'disabled_cpus + num_processors' is potentially small than what I 
> >> >> > need.
> >> >> 
> >> >> As said - I understand this. But you still fail to explain under what
> >> >> (realistic, i.e. other than someone bogusly setting NR_CPUS=1)
> >> >> conditions this ends up being a problem.
> >> > 
> >> > As we calculate nr_sockets with:
> >> > 
> >> > nr_sockets = total_cpus / _cpus_per_socket__
> >> > 
> >> > If the calculated total_cpus is smaller than the actual cpu count on the
> >> > hardware, then the nr_sockets is also potentially smaller than the
> >> > actual socket count on the hardware. This is not the expectation.
> >> 
> >> Sure - but you still don't say what is going to go wrong. Remember,
> >> when I asked you to change to the total count I gave an explicit
> >> example: Use of "nosmp" would have yielded a zero nr_sockets in
> >> the earlier code. Yet with the sum of num_processors and
> >> disabled_cpus this can't happen anymore afaict.
> > 
> > "nosmp" only has side effect on max_cpus and nr_cpu_ids, but they are
> > never used at all when calculating nr_sockets. So I can't see any reason
> > why with "num_processors + disabled_cpus" the nr_sockets would not be
> > zero, I think this is a bug that I should fix in nosmp case.
> 
> Did you really mean to say "would _not_ be zero"? Because afaict
> the above resolves to "would ever be zero".

Yes, I mean "would ever be zero".

> 
> >> Hence I'm looking
> >> forward to you detailing the conditions under which you would see
> >> an issue without introducing total_cpus.
> > 
> > As said before, with "num_processors + disabled_cpus" I may get a
> > smaller nr_sockets than the machine actually has. This is my exact
> > problem: I may miss enumerating some CAT-enabled sockets. While the
> > assumption is that I will follow your suggestion to make nr_socket >=
> > the socket count that the machine actually has.
> 
> Please can you stop repeating yourself? Yes, nr_sockets can get
> underestimated this way, but the cases where total_cpus !=
> num_processors + disabled_cpus are - afaict - such that this in
> the end does no harm. Other than the original case of nr_sockets
> ending up being zero when "nosmp". So to convince me of the
> opposite I'm afraid you won't get around constructing an explicit
> example where things break (and hence my earlier hint regarding
> NR_CPUS=1, as I wouldn't count this as a valid example).

My original thought is that with newly introduced total_cpus, nr_sockets
is less likely to be underestimated than with "num_processors +
disabled_cpus", if not now but for the future. But perhaps I understood 
it incorrectly.

While this is really not a big deal for me. And "num_processors +
disabled_cpus" does work for me now so I feel comfortable to use it.

I do have a problem when coming up with the final codes, there are two
possible ways:
1)  Make both num_processors and disabled_cpus public so that they can
be used when calculating nr_sockets.
2)  Add another public function set_nr_sockets in mpparse.c (just like
set_nr_cpu_ids does).

Which do you prefer?
Thanks,

Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.