[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to handle retry



> -----Original Message-----
> From: Paul Durrant
> Sent: 09 July 2015 14:43
> To: 'Jan Beulich'
> Cc: Andrew Cooper; xen-devel@xxxxxxxxxxxxxxxxxxxx; Keir (Xen.org)
> Subject: RE: [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to
> handle retry
> 
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > Sent: 09 July 2015 14:38
> > To: Paul Durrant
> > Cc: Andrew Cooper; xen-devel@xxxxxxxxxxxxxxxxxxxx; Keir (Xen.org)
> > Subject: RE: [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to
> > handle retry
> >
> > >>> On 09.07.15 at 14:50, <Paul.Durrant@xxxxxxxxxx> wrote:
> > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > >> Sent: 09 July 2015 13:04
> > >> >>> On 09.07.15 at 13:11, <Paul.Durrant@xxxxxxxxxx> wrote:
> > >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > >> >> Sent: 09 July 2015 11:05
> > >> >> >>> On 03.07.15 at 18:25, <paul.durrant@xxxxxxxxxx> wrote:
> > >> >> > @@ -287,17 +271,56 @@ static int hvmemul_do_io_addr(
> > >> >> >      bool_t is_mmio, paddr_t addr, unsigned long *reps,
> > >> >> >      unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
> > >> >> >  {
> > >> >> > -    struct page_info *ram_page;
> > >> >> > +    struct vcpu *v = current;
> > >> >> > +    unsigned long ram_gmfn = paddr_to_pfn(ram_gpa);
> > >> >> > +    unsigned int page_off = ram_gpa & (PAGE_SIZE - 1);
> > >> >> > +    struct page_info *ram_page[2];
> > >> >> > +    int nr_pages = 0;
> > >> >>
> > >> >> unsigned int
> > >> >
> > >> > No, it's intentionally signed because the unwind code at the end of
> the
> > >> > function is:
> > >> >
> > >> >     while ( --nr_pages >= 0 )
> > >> >         hvmemul_release_page(ram_page[nr_pages]);
> > >> >
> > >> > I.e. the loop terminates when nr_pages gets to -1.
> > >>
> > >>      while ( nr_pages-- )
> > >>          hvmemul_release_page(ram_page[nr_pages]);
> > >>
> > >
> > > But that wouldn't be correct, since the loop would try to release
> > > ram_page[1] and ram_page[0] in the case where only ram_page[0] had
> > been
> > > acquired. It would have to be:
> > >
> > >       while ( nr_pages )
> > >           hvmemul_release_page(ram_page[--nr_pages]);
> >
> > Unless you need the value of nr_pages after the loop, I can't see
> > the difference between your and my proposal - the decrement in
> > both cases happens between determining whether to continue the
> > loop and accessing the intended array element.
> 
> D'oh. Of course it does.
> 
> >
> > > which I'll do, since you so dislike signed ints.
> >
> > Sounds like you're not understanding why. Just look at the
> > generated code when using signed vs unsigned int as array
> > index.
> 
> Ok, I'll take a look because I would not have thought it would make a huge
> difference.
> 

So.... Two test programs:

test1.c:
-------

#include <stdio.h>

int main(void)
{
  int i = 10;
  int a[10];

  while ( --i >= 0 ) {
    a[i] = i;
    printf("%d\n", i);
  }

  return 0;
}

test2.c:
-------

#include <stdio.h>

int main(void)
{
  unsigned int i = 10;
  int a[10];

  while ( i ) {
    a[--i] = i;
    printf("%d\n", i);
  }

  return 0;
}

I compiled them just using 'make test1' and 'make test2', so using whatever the 
default options are for gcc (on a debian wheezy box). The relevant bits of the 
objdump are:

test1
-----

000000000040050c <main>:
  40050c:       55                      push   %rbp
  40050d:       48 89 e5                mov    %rsp,%rbp
  400510:       48 83 ec 30             sub    $0x30,%rsp
  400514:       c7 45 fc 0a 00 00 00    movl   $0xa,-0x4(%rbp)
  40051b:       eb 20                   jmp    40053d <main+0x31>
  40051d:       8b 45 fc                mov    -0x4(%rbp),%eax
  400520:       48 98                   cltq   
  400522:       8b 55 fc                mov    -0x4(%rbp),%edx
  400525:       89 54 85 d0             mov    %edx,-0x30(%rbp,%rax,4)
  400529:       8b 45 fc                mov    -0x4(%rbp),%eax
  40052c:       89 c6                   mov    %eax,%esi
  40052e:       bf fc 05 40 00          mov    $0x4005fc,%edi
  400533:       b8 00 00 00 00          mov    $0x0,%eax
  400538:       e8 a3 fe ff ff          callq  4003e0 <printf@plt>
  40053d:       83 6d fc 01             subl   $0x1,-0x4(%rbp)
  400541:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)
  400545:       79 d6                   jns    40051d <main+0x11>
  400547:       b8 00 00 00 00          mov    $0x0,%eax
  40054c:       c9                      leaveq 
  40054d:       c3                      retq   
  40054e:       90                      nop
  40054f:       90                      nop

test2
-----

000000000040050c <main>:
  40050c:       55                      push   %rbp
  40050d:       48 89 e5                mov    %rsp,%rbp
  400510:       48 83 ec 30             sub    $0x30,%rsp
  400514:       c7 45 fc 0a 00 00 00    movl   $0xa,-0x4(%rbp)
  40051b:       eb 22                   jmp    40053f <main+0x33>
  40051d:       83 6d fc 01             subl   $0x1,-0x4(%rbp)
  400521:       8b 55 fc                mov    -0x4(%rbp),%edx
  400524:       8b 45 fc                mov    -0x4(%rbp),%eax
  400527:       89 54 85 d0             mov    %edx,-0x30(%rbp,%rax,4)
  40052b:       8b 45 fc                mov    -0x4(%rbp),%eax
  40052e:       89 c6                   mov    %eax,%esi
  400530:       bf fc 05 40 00          mov    $0x4005fc,%edi
  400535:       b8 00 00 00 00          mov    $0x0,%eax
  40053a:       e8 a1 fe ff ff          callq  4003e0 <printf@plt>
  40053f:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)
  400543:       75 d8                   jne    40051d <main+0x11>
  400545:       b8 00 00 00 00          mov    $0x0,%eax
  40054a:       c9                      leaveq 
  40054b:       c3                      retq   
  40054c:       90                      nop
  40054d:       90                      nop
  40054e:       90                      nop
  40054f:       90                      nop

So, the only significant difference I can see is that using a signed value 
causes an extra cltq instruction to be used. Or maybe I need some specific 
level of optimization for there to be a big difference?

  Paul

>   Paul
> 
> >
> > Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.