[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7 01/14] x86: add socket_to_cpumask



>>> On 19.05.15 at 11:51, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
> On Tue, May 19, 2015 at 08:31:53AM +0100, Jan Beulich wrote:
>> >>> On 19.05.15 at 09:10, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
>> > On Tue, May 19, 2015 at 07:52:04AM +0100, Jan Beulich wrote:
>> >> >>> On 19.05.15 at 08:47, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
>> >> > On Tue, May 19, 2015 at 07:28:49AM +0100, Jan Beulich wrote:
>> >> >> >>> On 19.05.15 at 08:12, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
>> >> >> > On Mon, May 18, 2015 at 02:21:40PM +0100, Jan Beulich wrote:
>> >> >> >> >>> On 08.05.15 at 10:56, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
>> >> >> >> > @@ -112,6 +115,8 @@ static int __devinit 
>> >> >> >> > MP_processor_info_x(struct 
>> >> > mpc_config_processor *m,
>> >> >> >> >  {
>> >> >> >> >           int ver, apicid, cpu = 0;
>> >> >> >> >           
>> >> >> >> > + total_cpus++;
>> >> >> >> > +
>> >> >> >> >   if (!(m->mpc_cpuflag & CPU_ENABLED)) {
>> >> >> >> >           if (!hotplug)
>> >> >> >> >                   ++disabled_cpus;
>> >> >> >> 
>> >> >> >> Is there a reason you can't use disabled_cpus and avoid adding yet
>> >> >> >> another variable?
>> >> >> > 
>> >> >> > The problem is not with disabled_cpus but with num_processors, which
>> >> >> > does not keep the original detected cpus in current code.
>> >> >> > Hence 'total_cpus = disabled_cpus + num_processors' may not be 
>> >> >> > correct
>> >> >> > in some cases.
>> >> >> 
>> >> >> Please be more specific about when this is a problem (I do note that
>> >> >> I'm aware that the equation will not always hold, but during my
>> >> >> inspection while reviewing your change I didn't see that this would
>> >> >> ever become problematic).
>> >> > 
>> >> > What I really need is the original cpu count enumerated from MADT. If
>> >> > not introduce total_cpus then the only way getting it AFAICS is
>> >> > 'disabled_cpus + num_processors'.
>> >> > 
>> >> > The problem is that MP_processor_info_x() have some earlier returns
>> >> > before increasing num_processors. In those cases, the cpu detected will
>> >> > neither counted to disabled_cpus nor num_processors, which means
>> >> > 'disabled_cpus + num_processors' is potentially small than what I need.
>> >> 
>> >> As said - I understand this. But you still fail to explain under what
>> >> (realistic, i.e. other than someone bogusly setting NR_CPUS=1)
>> >> conditions this ends up being a problem.
>> > 
>> > As we calculate nr_sockets with:
>> > 
>> > nr_sockets = total_cpus / _cpus_per_socket__
>> > 
>> > If the calculated total_cpus is smaller than the actual cpu count on the
>> > hardware, then the nr_sockets is also potentially smaller than the
>> > actual socket count on the hardware. This is not the expectation.
>> 
>> Sure - but you still don't say what is going to go wrong. Remember,
>> when I asked you to change to the total count I gave an explicit
>> example: Use of "nosmp" would have yielded a zero nr_sockets in
>> the earlier code. Yet with the sum of num_processors and
>> disabled_cpus this can't happen anymore afaict.
> 
> "nosmp" only has side effect on max_cpus and nr_cpu_ids, but they are
> never used at all when calculating nr_sockets. So I can't see any reason
> why with "num_processors + disabled_cpus" the nr_sockets would not be
> zero, I think this is a bug that I should fix in nosmp case.

Did you really mean to say "would _not_ be zero"? Because afaict
the above resolves to "would ever be zero".

>> Hence I'm looking
>> forward to you detailing the conditions under which you would see
>> an issue without introducing total_cpus.
> 
> As said before, with "num_processors + disabled_cpus" I may get a
> smaller nr_sockets than the machine actually has. This is my exact
> problem: I may miss enumerating some CAT-enabled sockets. While the
> assumption is that I will follow your suggestion to make nr_socket >=
> the socket count that the machine actually has.

Please can you stop repeating yourself? Yes, nr_sockets can get
underestimated this way, but the cases where total_cpus !=
num_processors + disabled_cpus are - afaict - such that this in
the end does no harm. Other than the original case of nr_sockets
ending up being zero when "nosmp". So to convince me of the
opposite I'm afraid you won't get around constructing an explicit
example where things break (and hence my earlier hint regarding
NR_CPUS=1, as I wouldn't count this as a valid example).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.