[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[SPAM] Re: [Xen-API] [SPAM] XCP 1.0beta and vastSky and how "far" I got with it.



Hi Henrik,

Thanks for trying out vastsky on xcp 1.0 beta.
Currently, testing has not completed and we have at least two issues, (not 
relevant to your
issues though), that need to be addressed to get it working.
One of the issues requires dom0 kernel bug fix, which I have been waiting for next update release.
So, please be patient a little while;)

We can discuss your vastsky specific issue at
vastsky-devel@xxxxxxxxxxxxxxxxxxxxx

Thanks,
Tomoe


On 12/21/2010 05:07 PM, Henrik Andersson wrote:
I also sent this email to xen-users list, I hope it's ok.

Hello all. I'm trying to figure out vastSky on xcp1.0 beta. As we all know, it 
has been integrated to the xcp. That's just about all one can find about the 
matter. Been trying to google a lot, but with no luck. I'll write here what 
information I've gathered, what I tried and how far I managed to get with this.

I include information about my hardware, in case it has something to do with all 
this. I have one four node SuperMicro twin2 server (2026TT-HiBQRF) with QDR 
InfiniBand (haven't bought a switch or managed to get drivers for dom0's yet, so 
it's gigabit ethernet for now). Each node is identical, containing: 1x Intel Xeon 
E5620, 12GB ddr3, 3x 60GB OCZ Vertex2 ssd and 3x 500 GB Seagate Momentus 7200.4 SATA 
2.5". No raid cards, just the onboard ICH10.

Networking configuration:

node A: hostname: super0nodeA ip: 192.168.10.210
node B: hostname: super0nodeB ip: 192.168.10.211
node C: hostname: super0nodeC ip: 192.168.10.212
node D: hostname: super0nodeD ip: 192.168.10.213

I have bonded two interfaces on each node, have only one gigabit switch and 
haven't done any multipath configurations.

My plan was to use super0nodeA as Storage Manager and super0nodeB, super0nodeC, 
super0nodeD as storage servers but ended up installing storage and head server 
on super0nodeA also.

http://sourceforge.net/apps/mediawiki/vastsky/index.php?title=Main_Page
this seem's to be good starting point. If one click's "Install manual" on the left, you get 
to: 
http://vastsky.svn.sourceforge.net/viewvc/vastsky/tags/v3.0/doc/vas_install.txt?revision=367&view=markup
 
<http://vastsky.svn.sourceforge.net/viewvc/vastsky/tags/v3.0/doc/vas_install.txt?revision=367&view=markup>

Install manual isn't xcp specific, actually it only references to xcp couple of 
times but it seem's pretty straight forward when it comes to config's. Someone 
at ##xen-api clarified what thing's I need to install. I mean there actually is 
/etc/vas.conf on stock xcp 1.0 beta but one still need's to install the needed 
rpm's to get the functionality.

So, as (also) stated in the installation document one needs (taken from the 
vas_install.txt):
<start copy paste>
vastsky-common.rpm  Common library and configuration
vastsky-hsvr.rpm         Head server agent
vastsky-ssvr.rpm         Storage server agent
vastsky-sm.rpm          Storage manager
vastsky-cli.rpm           Storage manager command-line clients
vastsky-doc.rpm         Documentations (including this file)

Basically,
- -common package is required by other packages.
- Head servers need -hsvr package.
- Storage servers need -ssvr package.
- The storage manager needs and -sm package.
- The host on which you want to run user commands needs -cli package.
<end copy paste>

Everything I did, I did on dom0 of each server, actually I had no domU's on 
these servers when I did all this.

So, first I edited "/etc/vas.conf" that exist on all four nodes, inserted ip for "super0nodeA". It 
says "Comma separated list of hosts on which storage manager runs" but I remember reading somewhere, that 
there can only be one instance of it. Maybe one can define multiple ip's on a single host. I didn't find anything else 
to modify in "/etc/vas.conf".

<part of vas.conf>
[storage_manager]

# host_list:
# Comma separated list of hosts on which storage manager runs.
host_list: 192.168.10.210
</part of vas.conf>

Then I created "/var/lib/vas/register_device_list" on each node. Added disk's, 
following the instructions on vas_install.txt. I configured one ssd disk and one hdd on 
each node. Actually, first I added this to nodes B, C and D, but later on, I added this 
to A also.

I didn't modify "/etc/multipath.conf" since vas_install.txt states "This step is not necessary if you 
solely use our XCP SR driver". Also I didn't modify "/etc/hosts", since I used IP address instead of 
host name in "/etc/vas.conf" and haven't found any where else to insert host names or ip addresses.

Then after multiple reboot's and plenty of googling, I went to #xen and #xen-api to ask some help. I was told that I need to install the rpm's. It was 
"ahaa" moment and explained nicely why I didn't have cli commands availeable or "/etc/init.d"  script's for the vastSky servers. So I did 
"rpm -i vastsky-hsvr.rpm" and "rpm -i vastsky-ssvr.rpm" on all nodes. I also did "rpm -i vastsky-sm.rpm" and "rpm -i 
vastsky-cli.rpm" on "super0nodeA". vastsky-common.rpm is already installed on "stock" xcp 1.0 beta and it is vastSky 2.1, so all the 
rpm's I installed, were from 2.1, not 3.0 that seem's to be the newest version availeable at: http://sourceforge.net/projects/vastsky/files/vastsky/

Then I did "/etc/init.d/vas_sm init" and "/etc/init.d/vas_sm start" on 
"super0nodeA". Seemed like I was on fire. Finally I had some processe's running that I was pretty 
comfortable thinking had something to do with vastSky. Finaly I had commands working like:

- hsvr_list "list head servers"
- ssvr_list "list storage servers"
- pdsk_list "list physical disks"

Tho no resources present, even after I issued "/etc/init.d/vas_hsvr start" and "/etc/init.d/vas_ssvr start" on nodes "super0nodeB", 
"super0nodeC" and "super0nodeD". I knew that these services started since "ps -aux | grep vas" told me so and also because I started getting 
lines on "/var/log/vas_<host name>.log" (not sure if that is correct but the log files can be found at "/var/log", there is only one starting with 
"vas" there and it is similar to what i wrote).

This is when I started thinking if the problem migth be network related. So I installed 
vastsky-hsvr.rpm and vastsky-ssvr.rpm to super0nodeA and started them. I also modified my 
"/etc/hosts" and added:

192.168.10.210 super0nodeA super0nodeA-data1 super0nodeA-data2
192.168.10.211 super0nodeB super0nodeB-data1 super0nodeB-data2
192.168.10.212 super0nodeC super0nodeC-data1 super0nodeC-data2
192.168.10.213 super0nodeD super0nodeD-data1 super0nodeD-data2

I did this to all nodes.

This is when I finally had something come out of "storage manager". If I did hsvr_list, 
ssvr_list or pdsk_list, they all printed one resource, and it was the same that was on 
"super0nodeA", where the storage manager was also running. So still no connections from 
other nodes, even if I rebooted all nodes.

After re-re-re-re-checking all the config's I did "/etc/init.d/vas_hsvr stop", "/etc/init.d/vas_ssvr 
stop" and "/etc/init.d/vas_ssvr start" on "super0nodeA". About 5s after I started vas_ssvr I 
observed my server shutting down. Tried to start it, just to see it shut it self again just after the loading screen 
with panda on it. Just a text saying something about stunnel and bunch of numbers on top of the screen. Well I taught 
it was something I did, so I re-installed xcp.

While I was reinstallin xcp to node A, I started to think that my problem might be node A, so I installed 
vastsky-cli.rpm and vastsky-sm.rpm to "super0nodeB", modified (changed the "host_list: 
192.168.10.210" to 192.168.10.211) "/etc/vas.conf" on node B, C and D. Again, I had 
connections from head and storage servers, but only from local ones. Still no connections from nodes C or D.

I did "/etc/init.d/vas_hsvr stop", "/etc/init.d/vas_ssvr stop" and "/etc/init.d/vas_ssvr 
start" on node B and again, server started shutting it self down. This time I had another ssh session where I had 
"tail -f /var/log/vas_super0nodeB.log" so even if the server shutted it self down, I was able to copy paste 
the content of the screen:

<start of log>
2010-12-19 15:47:59,435 ssvr_reporter DEBUG /opt/vas/bin/daemon_launcher -n 1 
/opt/vas/bin/DiskPatroller /var/run/DiskPatroller.run
2010-12-19 15:47:59,443 storage_manager INFO DISPATCH registerStorageServer 
called. ({'ip_data': ['192.168.10.211', '192.168.10.211'], 'ver': 3},)
2010-12-19 15:47:59,444 storage_manager INFO DISPATCH registerStorageServer EXCEPTION 
<Fault 17: 'EEXIST'>
2010-12-19 15:47:59,445 ssvr_reporter ERROR shutdown
2010-12-19 15:47:59,500 ssvr_reporter DEBUG shutdown -g0 -h now
2010-12-19 15:47:59,501 ssvr_reporter ERROR Traceback (most recent call last):
   File "ssvr_reporter.py", line 231, in main
   File "ssvr_reporter.py", line 100, in register_resources
   File "vas_subr.py", line 68, in send_request
   File "/usr/lib/python2.4/xmlrpclib.py", line 1096, in __call__
     return self.__send(self.__name, args)
   File "/usr/lib/python2.4/xmlrpclib.py", line 1383, in __request
     verbose=self.__verbose
   File "/usr/lib/python2.4/xmlrpclib.py", line 1147, in request
     return self._parse_response(h.getfile(), sock)
   File "/usr/lib/python2.4/xmlrpclib.py", line 1286, in _parse_response
     return u.close()
   File "/usr/lib/python2.4/xmlrpclib.py", line 744, in close
     raise Fault(**self._stack[0])
Fault: <Fault 17: 'EEXIST'>
2010-12-19 15:48:00,337 storage_manager DEBUG RW.__send_request 
('192.168.10.211', '192.168.10.211') 8883 registerShredRequest {'dextid': 4, 
'capacity': 465, 'pdskid': 3, 'ver': 3, 'offset': 0}
2010-12-19 15:48:00,338 storage_manager DEBUG RW.__send_request 
('192.168.10.211', '192.168.10.211') 8883 registerShredRequest {'dextid': 2, 
'capacity': 55, 'pdskid': 2, 'ver': 3, 'offset': 0}
2010-12-19 15:48:00,340 ssvr_agent INFO DISPATCH registerShredRequest called. 
({'dextid': 4, 'ver': 3, 'pdskid': 3, 'capacity': 465, 'offset': 0},)
2010-12-19 15:48:00,342 ssvr_agent INFO DISPATCH registerShredRequest called. 
({'dextid': 2, 'ver': 3, 'pdskid': 2, 'capacity': 55, 'offset': 0},)
2010-12-19 15:48:00,343 ssvr_agent INFO false [Status 256]
2010-12-19 15:48:00,343 ssvr_agent INFO retrying(1/16) ...
2010-12-19 15:48:00,345 ssvr_agent INFO false [Status 256]
2010-12-19 15:48:00,345 ssvr_agent INFO retrying(1/16) ...
<end of log>

Notice: "2010-12-19 15:47:59,500 ssvr_reporter DEBUG shutdown -g0 -h now"

I did "/etc/init.d/vas_hsvr stop", "/etc/init.d/vas_ssvr stop" and 
"/etc/init.d/vas_ssvr start" on node C also and exactly the same happened. Server shut it self down 
and cant be started. Same stunnel... error.

This is how far I got before I stopped trying. Hope this helps someone else. I 
would also welcome input if some one has something to say.

-Henrik Andersson



_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api




_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.