[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] a question about popen() performance on domU
> -----Original Message----- > From: xuehai zhang [mailto:hai@xxxxxxxxxxxxxxx] > Sent: 24 November 2005 15:41 > To: Petersson, Mats > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Tim Freeman; Kate Keahey > Subject: Re: [Xen-devel] a question about popen() performance on domU > > > See comments below. > > Thanks Mats. I have more questions about your comments below. > > Xuehai > > >>-----Original Message----- > >>From: xuehai zhang [mailto:hai@xxxxxxxxxxxxxxx] > >>Sent: 24 November 2005 14:02 > >>To: Petersson, Mats > >>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Tim Freeman; Kate Keahey > >>Subject: Re: [Xen-devel] a question about popen() > performance on domU > >> > >>Mats, > >> > >>Thanks a lot for the response. > >> > >> > >>>I did have a look at popen, and essentially, it does the > >> > >>following [ > >> > >>>the real code is MUCH more complicated, doing lots of > >> > >>open/dup/close > >> > >>>on pipes and stuff]: > >>>if (!fork()) > >>> exec("/bin/sh", "sh", "-c", cmd, NULL); > >> > >>I took a look at the popen source code too yesterday and the above > >>lines are the esstential part. A thread at gnu list > >>(http://lists.gnu.org/archive/html/bug-global/2005-06/msg00001 > >>.html) suggets > >>popen() might depend on how fast /bin/sh is executed. On both my VM > >>and the physical machine, the kernel version is 2.6.11, > glibc version > >>is 2.3.2.ds1-21, and /bin/sh is linked to /bin/bash. I also > tried to > >>see any difference of the shared libraries used by /bin/sh on both > >>machines and found /bin/sh on the physical machine uses > libraries from > >>/lib/tls while for the VM this directory is disabled. > >> > >>VM$ ldd /bin/sh > >> libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa7000) > >> libdl.so.2 => /lib/libdl.so.2 (0xb7fa3000) > >> libc.so.6 => /lib/libc.so.6 (0xb7e70000) > >> /lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000) > >> > >>PHYSICAL$ ldd /bin/sh > >> libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa6000) > >> libdl.so.2 => /lib/tls/libdl.so.2 (0xb7fa2000) > >> libc.so.6 => /lib/tls/libc.so.6 (0xb7e6d000) > >> /lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000) > > > > > > In this particular case, I would think that lib/tls is not > a factor, > > but it may be worth disabling the tls libraries on the > pysical machine > > too, just to make sure... [just "mv /lib/tls > /lib/tls.disabled" should > > do it]. > > I don't think /lib/tls is the factor too. I did rerun the > tests with tls disabled on the physical machine and it gave > even worse performance for the tests. So, I switched it back. > > > > >>>The fork creates another process, which then executes the /bin/sh, > >>>which again causes another fork/exec to take place in the > effort of > >>>executing the actual command given. > >>> > >>>So the major component of popen would be fork() and > >> > >>execl(), both of > >> > >>>which cause, amongst other things, a lot of page-table work and > >>>task-switching. > >>> > >>>Note that popen is implemented in glibc [I took the 2.3.6 > >> > >>source code > >> > >>>from www.gnu.org for my look at this], so there's no > >> > >>difference in the > >> > >>>implementation of popen itself - the difference lies in how > >> > >>the Linux > >> > >>>kernel handles fork() and exec(), but maybe more importantly, how > >>>task-switches and page-tables are handled in Linux native > >> > >>and Xen-Linux. > >> > >>>Because Xen keeps track of the page-tables on top of > >> > >>Linux's handling > >> > >>>of page-tables, you get some extra work here. So, it should > >> > >>really be > >> > >>>slower on Xen than on native Linux. > >>>[In fact, the question came up not so long ago, why Xen was SLOWER > >>>than native Linux on popen (and some others) in a particular > >>>benchmark, and the result of that investigation was that > >> > >>it's down to, > >> > >>>mainly, task-switching takes longer in Xen.] > >> > >>I agree with your explanation about Xen was SLOWER than > native Linux > >>on popen because of the longer task-switching in Xen. The problem I > >>met (popen runs faster on Xen VM than the physical machine) looks > >>abnormal. I ran several home-made benchmarking programming and used > >>the "strace" tool to trace the system call performance. The first > >>program is to test the performance of both popen and pclose > (a loop of > >>popen call with a followup pclose call) and the source of > the program > >>and the strace results are available at > >>http://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-popen/st > >>race.txt. The results shows the waitpid syscall costs more time on > >>physical machine than on the VM (see the usecs/call valuee in the > >>following table). > >> > >> % time seconds usecs/call calls > >>errors syscall > >> ------ ----------- ----------- --------- > >>--------- ---------------- > >>VM: 63.43 0.127900 6395 20 > >> waitpid > >>PHYSICAL > >>MACHINE: 93.87 0.532498 26625 20 > >> waitpid > >> > >>waitpid is called by pclose as shown in the glibc source > code. So, my > >>original post questioning the performance of popen should > take pclose > >>into consideration too. A more accurate question I should post is, > >>popen+pclose executes faster on my VM than my physical machine. The > >>popen/pclose benchmark I did narrows the problem down to > waitpid that > >>waitpid somehow is suffering on the physical machine. > >>So, I did a followup experiment to test the fork and waitpid > >>performance on both machines. The program is a loop of fork > call with > >>a followup waitpid call. The source of the program and the strace > >>results are available at > >>http://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-fork/str > >>ace.txt. The strace results confirm the waitpid costs more > time on the > >>physical machine (154 usec/call) than the VM (56 usec/call). > >>However, the program runs faster on the physical machine > (not like the > >>popen/pclose program) and the results suggest the fork > syscall used on > >>the VM costs more time than the clone syscall on the > physical machine. > >>I have a question here, why the physical machine doesn't use fork > >>syscall but the clone syscall for the same program? > > > > > > Because it's using the same source for glibc! glibc says to use > > _IO_fork(), which is calling the fork syscall. Clone would > probably do > > the same thing, but for whatever good or bad reason, the > author(s) of > > thise code chose to use fork. There may be good reasons, or > no reason > > at all to do it this way. I couldn't say. I don't think it makes a > > whole lot of difference if the actual command executed by popen is > > actually "doing something", rather than just an empty "return". > > Do you have any suggestion why the same code uses different > syscalls on two machines which have the same kernel and glibc? That I can't explain. I guess one possibility is that in some way, the fork() call gets translated to clone() at some other level. I did a grep for _IO_fork in the source for glibc, and it comes back as #define _IO_fork fork . > > >>>The reason it is not would probably have something to do with the > >>>differences in hardware on Linux vs. Xen platforms, perhaps > >> > >>the fact > >> > >>>that your file-system is a virtual block-device and thus > >> > >>lives inside > >> > >>>a file that is perhaps better cached or otherwise handled in a > >>>different way on the Xen-system. > >> > >>Let me describe the hardware context of my VM and physical machine. > >>The host of my VM and the physical machine I tested against the VM, > >>are two nodes of a physical cluster with the same hardware > >>configuration (Dual Intel PIII 498.799 MHz CPU, 512MB > memory, a 4GB HD > >>with same partitions). The physical machine is rebooted > with "nosmp". > >>The VM host is rebooted into Xen with "nosmp" (Xen version > >>information is "Latest > >>ChangeSet: 2005/05/03 17:30:40 1.1846 > >>4277a730mvnFSFXrxJpVRNk8hjD4Vg"). Xen dom0 is assigned 96MB > memory and > >>the VM is the only user domain running on the VM host with 395MB > >>memory. Both dom0 and the VM are pinned to CPU 0. > >> > >>Yes, the backends of the VM's VBDs are loopback files in > dom0. Three > >>loopback files are used to map to three partitions inside > of the VM. I > >>acutally thought about the possible caching effect of the VM's VBD > >>backends, but not sure how to testify it and compare it with the > >>physical machine. Is it possible the Xen has different assurance of > >>writing back than the physical machine, that is, the data > is kept in > >>memory longer before is actually written to disk? > > > > > > Xen itself doesn't know ANYTHING about the disk/file where > the data for > > the Dom0 or DomU comes from, so no, Xen would not do that. > However, the > > loopback file-system that is involved in VBD's would potentially do > > things that are different from the actual hardware. > > So, there is possbility that the loopback file-system can do > something tricky like caching and > results in better performance for applications running inside > of the VM? > > > I think you should be able to mount the virtual disk as a > "device" on > > your system. > > What does "your system" here refer to? Does it mean dom0 or > inside of domU? Your system here referes to "PHYSICAL". > > > I don't know of the top of my head how to do that, but > > essentially something like this: > > mount myimage.hdd loop/ -t ext3 [additional parameters may > be needed]. > > > > You could then do "chroot loop/", and perform your tests there. This > > should execute the same thing from the same place on the > native linux as > > you would in DomU. > > > > Now, this may not run faster on native than your original > setup, but I > > wouldn't be surprised if it does... > > This is interesting. I will try to run the same tests if I > canmount the virtual disk as "device" > successfully. Please share the results... ;-) > > Thanks. > > Xuehai > > > >>>Now, I'm not saying that there isn't a possibility that > >> > >>something is > >> > >>>managed differently in Xen that makes this run faster - I > >> > >>just don't > >> > >>>really see how that would be likely, since everything that > >> > >>happens in > >> > >>>the system is going to be MORE complicated by the extra > >> > >>layer of Xen > >> > >>>involved. > >> > >>>If anyone else has some thoughts on this subject, it would be > >>>interesting to hear. > >> > >>I agree. But given the VM having same hardware/software > >>configuration as the physical machine, it runs faster still > >>looks abnormal to me. I wonder if there is any other more > >>efficient debugging strategies I can use to investigate it. I > >>appreciate if any one has any more suggestions. > >> > >>Thanks again. > >> > >>Xuehai > >> > >> > >>>>-----Original Message----- > >>>>From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx > >>>>[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf > Of xuehai > >>>>zhang > >>>>Sent: 23 November 2005 20:26 > >>>>To: xen-devel@xxxxxxxxxxxxxxxxxxx > >>>>Cc: Tim Freeman; Kate Keahey > >>>>Subject: [Xen-devel] a question about popen() performance on domU > >>>> > >>>>Dear all, > >>>>When I compared the performance of some application on both > >> > >>a Xen domU > >> > >>>>and a standard linux machine (where domU runs on a > similar physical > >>>>mahine), I notice the application runs faster on the domU > >> > >>than on the > >> > >>>>physical machine. > >>>>Instrumenting the application code shows the application > >> > >>spends more > >> > >>>>time on popen() calls on domU than on the physical machine. > >> > >>I wonder > >> > >>>>if xenlinux does some special modification of the popen code to > >>>>improve its performance than the original Linux popen code? > >>>>Thanks in advance for your help. > >>>>Xuehai > >>>> > >>>>_______________________________________________ > >>>>Xen-devel mailing list > >>>>Xen-devel@xxxxxxxxxxxxxxxxxxx > >>>>http://lists.xensource.com/xen-devel > >>>> > >>>> > >>> > >>> > >>> > >> > >> > > > > > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |