[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb

To: Jan Beulich <jbeulich@xxxxxxxx>, daniel.kiper@xxxxxxxxxx
From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Date: Wed, 22 Jul 2015 09:42:55 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, keir@xxxxxxx
Delivery-date: Wed, 22 Jul 2015 08:43:15 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 22/07/2015 06:18, Jan Beulich wrote:
>>>> Daniel Kiper <daniel.kiper@xxxxxxxxxx> 07/21/15 8:23 PM >>>
>> On Tue, Jul 21, 2015 at 03:37:48AM -0600, Jan Beulich wrote:
>>>>>> On 20.07.15 at 16:28, <daniel.kiper@xxxxxxxxxx> wrote:
>>> ... because of ??? Nowadays - with X86_FEATURE_ERMS - rep stosb
>>> is expected to be faster than rep stosl.
>> OK, I did not know about that. However, as I know this feature
>> was introduced in 2012 with Ivy Bridge. So, I suppose that there
>> are still a lot of machines in the wild which does not support it.
>> Anyway, because this code is not performance critical I am not going
>> to insist on one or another solution. However, Andrew suggested that
>> thing, so, please agree with him in which direction we should go.
>> I will do what you agree.
> ISTR having seen a similar patch from him(?), maybe in another area
> of code, before (or was it v1 of this one?), which I responded to with the
> same as above.

Indeed you have, several in fact.  I had not had chance to delve into
the optimisation manuals, but have taken a peek now.  (Section 3.7.6)

In the case of having aligned source and destination on a 16-byte
boundary (which we can trivially arrange), then ERMSB (to give it its
Intel name) and rep stosl differ only in the setup cost; they still
scale at the same rate for changes in length.

Therefore, assuming we arrange for 16-byte alignment, using rep stosl
would appear to be a single 60ish cycle hit over using ERMSB, but would
be substantially more efficient than using rep stosb on a non-ERMSB system.

Overall, I think 16 byte alignment and rep stosl is the best compromise.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb
  - From: Jan Beulich

References:
- [Xen-devel] [PATCH v2 00/23] x86: multiboot2 protocol support
  - From: Daniel Kiper
- [Xen-devel] [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb
  - From: Daniel Kiper
- Re: [Xen-devel] [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb
  - From: Daniel Kiper
- Re: [Xen-devel] [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb
  - From: Jan Beulich

Prev by Date: Re: [Xen-devel] [PATCH v9 0/8] pci: add pci_iomap_wc() and pci_ioremap_wc_bar()
Next by Date: Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
Previous by thread: Re: [Xen-devel] [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb
Next by thread: Re: [Xen-devel] [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.