[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb

On 22/07/2015 06:18, Jan Beulich wrote:
>>>> Daniel Kiper <daniel.kiper@xxxxxxxxxx> 07/21/15 8:23 PM >>>
>> On Tue, Jul 21, 2015 at 03:37:48AM -0600, Jan Beulich wrote:
>>>>>> On 20.07.15 at 16:28, <daniel.kiper@xxxxxxxxxx> wrote:
>>> ... because of ??? Nowadays - with X86_FEATURE_ERMS - rep stosb
>>> is expected to be faster than rep stosl.
>> OK, I did not know about that. However, as I know this feature
>> was introduced in 2012 with Ivy Bridge. So, I suppose that there
>> are still a lot of machines in the wild which does not support it.
>> Anyway, because this code is not performance critical I am not going
>> to insist on one or another solution. However, Andrew suggested that
>> thing, so, please agree with him in which direction we should go.
>> I will do what you agree.
> ISTR having seen a similar patch from him(?), maybe in another area
> of code, before (or was it v1 of this one?), which I responded to with the
> same as above.

Indeed you have, several in fact.  I had not had chance to delve into
the optimisation manuals, but have taken a peek now.  (Section 3.7.6)

In the case of having aligned source and destination on a 16-byte
boundary (which we can trivially arrange), then ERMSB (to give it its
Intel name) and rep stosl differ only in the setup cost; they still
scale at the same rate for changes in length.

Therefore, assuming we arrange for 16-byte alignment, using rep stosl
would appear to be a single 60ish cycle hit over using ERMSB, but would
be substantially more efficient than using rep stosb on a non-ERMSB system.

Overall, I think 16 byte alignment and rep stosl is the best compromise.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.