[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] 1000 Domains: Not able to access Domu via xm console from Dom0




On 13 December 2012 14:58, Paul Harvey <jhebus@xxxxxxxxxxxxxx> wrote:



On 13 December 2012 12:36, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
On Thu, 2012-12-13 at 12:24 +0000, Paul Harvey wrote:
> So, i attached strace to xenconsoled to see i could find what was
> going on and i got this
>
> ioctl(1023, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon
> echo ...}) = 0
> ioctl(1023, TIOCGPTN, [345])            = 0
> stat("/dev/pts/345", {st_mode=S_IFCHR|0620, st_rdev=makedev(136,
> 345), ...}) = 0
> open("/dev/pts/345", O_RDWR|O_NOCTTY)   = -1 EMFILE (Too many open
> files)
> close(1023)                             = 0
> write(2, "Failed to create tty for domain-"..., 70) = 70
> open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 1023
> fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0
> fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0
>
>
> So this is definitely a problem with file limits, but i don't
> understand as the current limit on files per process is 65000

I wrote the following yesterday and although I see it in my sent box I
can't see it in the list archives and you don't seem to have received it
either. I've no idea where it got to...


On Tue, 2012-12-11 at 22:07 +0000, Paul Harvey wrote:
> On 7 December 2012 10:03, Ian
> Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
>         On Thu, 2012-12-06 at 23:27 +0000, Paul Harvey wrote:
>
>         > Any help, or is this a limitation of Xen?
>
>
>         One limit you might be hitting is the number of event channels
>         which
>         dom0 can handle. The maximum is currently 1024 for a 32-bit
>         domains and
>         4096 for 64-bit (that's per domains, not total in the system).
>         Depending
>         on the configuration of the mini-os domains (e.g. number of
>         devices etc)
>         you might be hitting this -- "lsevtchn 0" might give a clue if
>         this is
>         happening (that tool is in /usr/lib/xen somewhere).
>
>         Work has just started on expanding these limits to ~32k and
>         ~512k for
>         32- and 64-bit domains respectively, the hope is that this
>         will be done
>         in time for 4.3. Look for posts from Wei Liu on xen-devel this
>         week.
>
>         If you aren't hitting the evtchn limits then maybe you are
>         hitting some
>         dom0 OS level limitation, i.e. a ulimit on the number of open
>         file
>         descriptors which xenconsoled can have or some limit on the
>         number of
>         pty's.
>
>         Ian.
>
>
> Hi Ian,
>
>
> Thanks for the quick reply!
>
>
> Have looked into your suggestions and:
>
>
> * It is NOT the number of evntchns, this is much less that the limits
> you mention

OOI how many event channels do your 1000 domains require?
 

> * It is NOT the number of allowable PTY's, the number used is much
> less than the limit

Again OOI how many?

> * The number of per process file descriptors was set to 1024, but i
> have increased this to thousands :
> ulimint -n
> 10240

Did you apply this to the xenconsoled and other daemon processes too?
setting ulimit only effects the current process and its children.

> To hammer this point home, i built a wee C file to allocate pty's.
> Before i changed the limit i got problems past 1024, now it work fine
> as root, or any user.
>
>
> But, when i create ~350 domains:
>
>
> cat /proc/<xenconsoled>/fd | wc -l
> 1024
>
>
> only ever goes as high as 1024, and does not increase for subsequently
> added domains.

I suspect you haven't actually increased the ulimit for this process.
What does /proc/<xenconsoled>/limits contain?

There may also be sysctls which limit the number of fds a process can
have.

> Any other ideas?

> Also, as a side note, any idea why the domain creation time grows
> quadratically?

Grows with the number of running domains you mean?

There were some memory allocator optimisations discussed on xen-devel
recently, but I don't recall the details enough to know if it is
relevant here, it could be that though. Other than that I'm afraid I've
no ideas.

Ian.




Hi Ian, 

Thanks for getting back to me :)

So:

./lsevntchn 1000
   1: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 72
   2: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 73

cat /proc/sys/kernel/pty/max
4096

#with 338 Domains. There were 9 systems ones before starting
cat /proc/sys/kernel/pty/nr 
347

I have changed the configuration file /etc/security/limits.config and rebooted the machines and assumed that this would have applied the new limits to the deamons, but you were right and 

cat /proc/5388/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             87439                87439                processes 
Max open files            1024                1024                  files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       87439                87439                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us       


I killed all the domains and restarted the xenconsoled. This applies the new limits: 

cat /proc/27677/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             87439                87439                processes 
Max open files            65000                65000                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       87439                87439                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us       

BUT:

There is now a buffer overflow happening somewhere which is crashing the deamon when creating the 340th domain, as shown by strace: 

write(4, "\v\0\0\0\0\0\0\0\0\0\0\0+\0\0\0", 16) = 16
write(4, "/local/domain/1020/console/tty\0", 31) = 31
write(4, "/dev/pts/345", 12)            = 12
futex(0xd95124, FUTEX_WAIT_PRIVATE, 14161, NULL) = 0
futex(0xd950f8, FUTEX_WAKE_PRIVATE, 1)  = 0
rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER, 0x7fb5d50284a0}, NULL, 8) = 0
fcntl(1026, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
open("/dev/tty", O_RDWR|O_NOCTTY|O_NONBLOCK) = -1 ENXIO (No such device or address)
writev(2, [{"*** ", 4}, {"buffer overflow detected", 24}, {" ***: ", 6}, {"/usr/lib/xen-4.1/bin/xenconsoled", 32}, {" terminated\n", 12}], 5) = 78
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb5d5eb3000
open("/usr/lib/xen-4.1/bin/../lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 1028
fstat(1028, {st_mode=S_IFREG|0644, st_size=85812, ...}) = 0
mmap(NULL, 85812, PROT_READ, MAP_PRIVATE, 1028, 0) = 0x7fb5d5e9e000
close(1028)                             = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 1028
read(1028, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320(\0\0\0\0\0\0"..., 832) = 832
fstat(1028, {st_mode=S_IFREG|0644, st_size=88384, ...}) = 0
mmap(NULL, 2184216, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 1028, 0) = 0x7fb5cf9d1000
mprotect(0x7fb5cf9e6000, 2093056, PROT_NONE) = 0
mmap(0x7fb5cfbe5000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 1028, 0x14000) = 0x7fb5cfbe5000
close(1028)                             = 0
mprotect(0x7fb5cfbe5000, 4096, PROT_READ) = 0
munmap(0x7fb5d5e9e000, 85812)           = 0
futex(0x7fb5d53aedf0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7fb5cfbe61a4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(2, "======= Backtrace: =========\n", 29) = 29
writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"__fortify_fail", 14}, {"+0x", 3}, {"37", 2}, {")", 1}, {"[0x", 3}, {"7fb5d50fc807", 12}, {"]\n", 2}], 9) = 69
writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"+0x", 3}, {"109700", 6}, {")", 1}, {"[0x", 3}, {"7fb5d50fb700", 12}, {"]\n", 2}], 8) = 59
writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"+0x", 3}, {"10a7be", 6}, {")", 1}, {"[0x", 3}, {"7fb5d50fc7be", 12}, {"]\n", 2}], 8) = 59
writev(2, [{"/usr/lib/xen-4.1/bin/xenconsoled", 32}, {"[0x", 3}, {"403cb8", 6}, {"]\n", 2}], 4) = 43
writev(2, [{"/usr/lib/xen-4.1/bin/xenconsoled", 32}, {"[0x", 3}, {"4021d5", 6}, {"]\n", 2}], 4) = 43
writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"__libc_start_main", 17}, {"+0x", 3}, {"ed", 2}, {")", 1}, {"[0x", 3}, {"7fb5d501376d", 12}, {"]\n", 2}], 9) = 72
writev(2, [{"/usr/lib/xen-4.1/bin/xenconsoled", 32}, {"[0x", 3}, {"4022ad", 6}, {"]\n", 2}], 4) = 43
write(2, "======= Memory map: ========\n", 29) = 29


On 13 December 2012 15:27, Paul Harvey <jhebus@xxxxxxxxxxxxxx> wrote:
Sorry, thought that i pressed reply all


On 13 December 2012 15:19, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
Please can you keep this conversation on the mailing list.

On Thu, 2012-12-13 at 15:12 +0000, Paul Harvey wrote:
[...]




_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.