[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen: correctly restore pfn_to_mfn_list_list after resume



Dnia sobota, 21 listopada 2009 o 12:32:49 Ian Campbell napisaÅ(a):
> pvops kernels >= 2.6.30 can currently only be saved and restored once. The
> second attempt to save results in:
> 
>     ERROR Internal error: Frame# in pfn-to-mfn frame list is not in
>  pseudophys ERROR Internal error: entry 0: p2m_frame_list[0] is 0xf2c2c2c2,
>  max 0x120000 ERROR Internal error: Failed to map/save the p2m frame list
> 
> I finally narrowed it down to:
> 
>     commit cdaead6b4e657f960d6d6f9f380e7dfeedc6a09b
>         Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
>         Date:   Fri Feb 27 15:34:59 2009 -0800
> 
>             xen: split construction of p2m mfn tables from registration
> 
>             Build the p2m_mfn_list_list early with the rest of the p2m
>  table, but register it later when the real shared_info structure is in
>  place.
> 
>             Signed-off-by: Jeremy Fitzhardinge
>  <jeremy.fitzhardinge@xxxxxxxxxx>
> 
> The unforeseen side-effect of this change was to cause the mfn list list to
>  not be rebuilt on resume. Prior to this change it would have been rebuilt
>  via xen_post_suspend() -> xen_setup_shared_info() ->
>  xen_setup_mfn_list_list().
> 
> Fix by explicitly calling xen_build_mfn_list_list() from
>  xen_post_suspend().
> 
[---]

Ian,

I have downloaded and compiled pvops kernel after your fixes a week ago 
(commit e14a6cdfdf5b40330297701b4e6963f9eff6d8df Sat, 21 Nov 2009 23:59:07 
+0000 (07:59 +0800)). Now, it has been running stable as xen0 for about 5 days 
on a dual AMD Opteron 248 and a dual Intel Xeon E5520.

1. Opteron 248 guest

For all that time I have been compiling linux kernel in a loop (~700 
compilation rouds) on a virtual machine with 2 vcpus. I have mgrated the 
machine from time to time there and back from one phisical machine to the 
other, both having Opterons 248. Save/restore/save/restore works fine: kernel 
continues to compile, even ssh session was not closed.

I have tested only 64 bit kernel/userlands in both xen0/U.

2. Xeon 5520 guest

For 64 bit kernel/userlands in both xen0/U Save/restore/save/restore works 
fine: kernel continues to compile, ssh session stays open. I used 2 vcpus in 
the guest.

I have no possibility to check live migration on Xeon E5520 (no SAN 
connection).

Unfortunately save/restore does not work for 64bit kernel/userland in dom0 and 
32bit kernel/userland in domU (tested with 1 and then with 2 vcpus). Save 
hangs. Save file is ~1.5kB long and I'm getting on guest's console:
----8<----
[   34.729250] BUG: unable to handle kernel paging request at c1527000          
                                                                   
[   34.729271] IP: [<c1006593>] xen_set_pmd+0x73/0xb0                           
[   34.729288] *pdpt = 0000000403162027                                         
[   34.729299] Oops: 0003 [#1] SMP                                              
[   34.729312] last sysfs file: /sys/module/ip_tables/initstate                 
[   34.729321] Modules linked in: sch_sfq xt_limit ipt_REJECT xt_tcpudp 
ipt_LOG xt_state xt_multiport iptable_filter iptable_nat nf_nat 
nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables 
x_tables xenfs dm_multipath scsi_dh dm_mod st sd_mod crc_t10dif lpfc qla2xxx 
scsi_transport_fc scsi_tgt qla1280 scsi_mod psmouse uhci_hcd ehci_hcd usbcore 
pcspkr xen_netfront evdev ext3 jbd mbcache                                      
                                      
[   34.729485]                                                                  
[   34.729493] Pid: 1686, comm: kstop/0 xid: #0 Not tainted 
(2.6.31.6x_xenUnogrsecuritypae-BL5.5 #1)                                        
                    
[   34.729504] EIP: 0061:[<c1006593>] EFLAGS: 00010046 CPU: 0                   
[   34.729513] EIP is at xen_set_pmd+0x73/0xb0                                  
[   34.729520] EAX: c1527000 EBX: 031f3067 ECX: 00000004 EDX: c179b000          
[   34.729529] ESI: 00000004 EDI: c1527000 EBP: ddd75eb0 ESP: ddd75ea0          
[   34.729538]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069                    
[   34.729547] Process kstop/0 (pid: 1686, ti=ddd74000 task=df8f48c0 
task.ti=ddd74000)                                                               
           
[   34.729557] Stack:                                                           
[   34.729563]  c1527000 031f3067 1fd61067 c1527000 ddd75ec4 c10cd650 00000000 
00000000                                                                        
 
[   34.729595] <0> 00200000 ddd75f20 c10ceb14 00000000 123ab067 00000000 
00000fff 00001000                                                               
       
[   34.729630] <0> 00000fff c1463000 c1474f60 01ba9067 00000000 ddd75f14 
c1006f3a c153001c                                                               
       
[   34.729670] Call Trace:                                                      
[   34.729682]  [<c10cd650>] ? __pte_alloc_kernel+0xa0/0xb0                     
[   34.729693]  [<c10ceb14>] ? apply_to_page_range+0x314/0x330                  
[   34.729705]  [<c1006f3a>] ? xen_force_evtchn_callback+0x1a/0x30              
[   34.729717]  [<c10079c6>] ? arch_gnttab_unmap+0x26/0x30                      
[   34.729729]  [<c1007950>] ? unmap_pte_fn+0x0/0x50                            
[   34.729742]  [<c1204591>] ? gnttab_suspend+0x41/0x50                         
[   34.729753]  [<c120756a>] ? xen_suspend+0x3a/0xf0                            
[   34.729765]  [<c108873d>] ? stop_cpu+0x8d/0xd0                               
[   34.729776]  [<c1054022>] ? worker_thread+0x112/0x220                        
[   34.729787]  [<c10886b0>] ? stop_cpu+0x0/0xd0                                
[   34.729798]  [<c10587e0>] ? autoremove_wake_function+0x0/0x40                
[   34.729810]  [<c1053f10>] ? worker_thread+0x0/0x220                          
[   34.729821]  [<c10584ec>] ? kthread+0x7c/0x90                                
[   34.729831]  [<c1058470>] ? kthread+0x0/0x90                                 
[   34.729843]  [<c100ad17>] ? kernel_thread_helper+0x7/0x10                    
[   34.729851] Code: 00 75 48 8b 45 f0 89 da 89 f1 83 05 fc 32 53 c1 01 e8 e2 
fe ff ff 8b 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 90 8d 74 26 00 8b 45 f0 <89> 
18 89 70 04 eb e4 ba e0 32 53 c1 b9 33 00 00 00 31 c0 89 d7                     
     
[   34.730076] EIP: [<c1006593>] xen_set_pmd+0x73/0xb0 SS:ESP 0069:ddd75ea0     
[   34.730093] CR2: 00000000c1527000                                            
[   34.730102] ---[ end trace cd1b831872a4c87f ]---                             
[   34.730137] ------------[ cut here ]------------                             
[   34.730147] WARNING: at /root/rpm/BUILD/kernel-
xenUnogrsecuritypae-2.6.31.6x/linux-2.6.31/kernel/time/timekeeping.c:102 
getnstimeofday+0x102/0x110()         
[   34.730160] Modules linked in: sch_sfq xt_limit ipt_REJECT xt_tcpudp 
ipt_LOG xt_state xt_multiport iptable_filter iptable_nat nf_nat 
nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables 
x_tables xenfs dm_multipath scsi_dh dm_mod st sd_mod crc_t10dif lpfc qla2xxx 
scsi_transport_fc scsi_tgt qla1280 scsi_mod psmouse uhci_hcd ehci_hcd usbcore 
pcspkr xen_netfront evdev ext3 jbd mbcache
[   34.730316] Pid: 0, comm: swapper xid: #0 Tainted: G      D    
2.6.31.6x_xenUnogrsecuritypae-BL5.5 #1
[   34.730326] Call Trace:
[   34.730338]  [<c1333d7a>] ? printk+0x18/0x1e
[   34.730349]  [<c1040fcd>] warn_slowpath_common+0x6d/0xa0
[   34.730360]  [<c106b7d2>] ? getnstimeofday+0x102/0x110
[   34.730370]  [<c106b7d2>] ? getnstimeofday+0x102/0x110
[   34.730381]  [<c1041015>] warn_slowpath_null+0x15/0x20
[   34.730392]  [<c106b7d2>] getnstimeofday+0x102/0x110
[   34.730403]  [<c105c716>] ktime_get_ts+0x26/0x60
[   34.730413]  [<c105c766>] ktime_get+0x16/0x40
[   34.730425]  [<c107056c>] tick_nohz_stop_sched_tick+0x6c/0x390
[   34.730437]  [<c1009187>] cpu_idle+0x27/0x80
[   34.730449]  [<c1323e25>] rest_init+0x55/0x60
[   34.730461]  [<c14a186c>] start_kernel+0x2fb/0x301
[   34.730472]  [<c14a138e>] ? unknown_bootoption+0x0/0x1ad
[   34.730483]  [<c14a108d>] i386_start_kernel+0x7c/0x83
[   34.730494]  [<c14a418e>] xen_start_kernel+0x517/0x51f
[   34.730502] ---[ end trace cd1b831872a4c880 ]---
----8<----

I'm going to try newer commits.

Regards,

-- 
Bartosz Lis @ Inst. of Information Technology, Technical Univ. of Lodz, Poland
   bartoszl @ ics.p.lodz.pl

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.