[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Some kind of hardware issue



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I ran into strange disk issues when I tried to upgrade to the pvops
kernels on  two on my amd motherboards. On one of them, all I had to
do was try to copy a large file and  I'd get a slew of disk errors. I
went back to xen-3.x and a xenified kernel (from opensuse) and all was
well.

See http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1806

On 03/01/2013 05:55 PM, Daniel Hood wrote:
> Hi all,
> 
> First time posting.
> 
> So the story goes I bought a dedicated box from a hosting company. 
> Opteron 1218 and MSI motherboard (Not sure which model exactly,
> LSPCI output is below). I've tried installing Debian 6, Ubuntu
> 12.04 and CentOS 6 then install Xen 4 hypervisor on them at
> different stages of this issue. All three boot their normal kernels
> perfectly. Can't seem to find any errors related.
> 
> I then try to boot into my Xen kernel and these are the errors I'm
> getting: http://i.imgur.com/LHq7KCH.png 
> http://i.imgur.com/fEfnm0I.png
> 
> I've tried booting back into the normal kernel's and shit works.
> I've tried adding 'noacpi', 'acpi=off' and 'libata.force=noncq' on
> both the kernel and the module lines. No idea what else to try. Any
> ideas anyone?
> 
> Here is the outputs from the CentOS attempts:
> 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>  [root@virt-host01 init.d]# cat /boot/grub/grub.conf # # Hetzner
> Online AG - installimage # GRUB bootloader configuration file #
> 
> timeout 5 default 0
> 
> title CentOS (3.7.10-1.el6xen.x86_64) root (hd0,1) kernel /xen.gz
> dom0_mem=1024M cpufreq=xen dom0_max_vcpus=1 dom0_vcpus_pin module
> /boot/vmlinuz-3.7.10-1.el6xen.x86_64 ro root=/dev/md1 rd_NO_LUKS
> rd_NO_DM nomodeset SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8
> KEYTABLE=de module /boot/initramfs-3.7.10-1.el6xen.x86_64.img
> 
> title CentOS (3.7.10-1.el6xen.x86_64) root (hd0,1) kernel
> /boot/vmlinuz-3.7.10-1.el6xen.x86_64 ro root=/dev/md1 rd_NO_LUKS
> rd_NO_DM nomodeset SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8
> KEYTABLE=de initrd /boot/initramfs-3.7.10-1.el6xen.x86_64.img
> 
> title CentOS (2.6.32-279.22.1.el6.x86_64) root (hd0,1) kernel
> /boot/vmlinuz-2.6.32-279.22.1.el6.x86_64 ro root=/dev/md1 
> rd_NO_LUKS rd_NO_DM nomodeset initrd
> /boot/initramfs-2.6.32-279.22.1.el6.x86_64.img
> 
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>  LSPCI output:
> 
> 00:00.0 RAM memory: NVIDIA Corporation C51 Host Bridge (rev a2) 
> 00:00.1 RAM memory: NVIDIA Corporation C51 Memory Controller 0 (rev
> a2) 00:00.2 RAM memory: NVIDIA Corporation C51 Memory Controller 1
> (rev a2) 00:00.3 RAM memory: NVIDIA Corporation C51 Memory
> Controller 5 (rev a2) 00:00.4 RAM memory: NVIDIA Corporation C51
> Memory Controller 4 (rev a2) 00:00.5 RAM memory: NVIDIA Corporation
> C51 Host Bridge (rev a2) 00:00.6 RAM memory: NVIDIA Corporation C51
> Memory Controller 3 (rev a2) 00:00.7 RAM memory: NVIDIA Corporation
> C51 Memory Controller 2 (rev a2) 00:02.0 PCI bridge: NVIDIA
> Corporation C51 PCI Express Bridge (rev a1) 00:03.0 PCI bridge:
> NVIDIA Corporation C51 PCI Express Bridge (rev a1) 00:04.0 PCI
> bridge: NVIDIA Corporation C51 PCI Express Bridge (rev a1) 00:05.0
> VGA compatible controller: NVIDIA Corporation C51 [Quadro NVS 
> 210S/GeForce 6150LE] (rev a2) 00:09.0 RAM memory: NVIDIA
> Corporation MCP51 Host Bridge (rev a2) 00:0a.0 ISA bridge: NVIDIA
> Corporation MCP51 LPC Bridge (rev a3) 00:0a.1 SMBus: NVIDIA
> Corporation MCP51 SMBus (rev a3) 00:0b.0 USB controller: NVIDIA
> Corporation MCP51 USB Controller (rev a3) 00:0b.1 USB controller:
> NVIDIA Corporation MCP51 USB Controller (rev a3) 00:0d.0 IDE
> interface: NVIDIA Corporation MCP51 IDE (rev a1) 00:0e.0 IDE
> interface: NVIDIA Corporation MCP51 Serial ATA Controller (rev a1) 
> 00:0f.0 IDE interface: NVIDIA Corporation MCP51 Serial ATA
> Controller (rev a1) 00:10.0 PCI bridge: NVIDIA Corporation MCP51
> PCI Bridge (rev a2) 00:14.0 Bridge: NVIDIA Corporation MCP51
> Ethernet Controller (rev a3) 00:18.0 Host bridge: Advanced Micro
> Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology
> Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 
> [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro
> Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host
> bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
> Miscellaneous Control
> 
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>  Smartctl output:
> 
> root@rescue ~ # smartctl -a /dev/sda smartctl 5.41 2011-06-09 r3365
> [x86_64-linux-3.4.28] (local build) Copyright (C) 2002-11 by Bruce
> Allen, http://smartmontools.sourceforge.net
> 
> === START OF INFORMATION SECTION === Model Family:     SAMSUNG
> SpinPoint T166 Device Model:     SAMSUNG HD321KJ Serial Number:
> S0MQJDQP603258 LU WWN Device Id: 5 0000f0 0db603258 Firmware
> Version: CP100-10 User Capacity:    320,072,933,376 bytes [320 GB] 
> Sector Size:      512 bytes logical/physical Device is:        In
> smartctl database [for details use: -P show] ATA Version is:   8 
> ATA Standard is:  ATA-8-ACS revision 3b Local Time is:    Fri Mar
> 1 23:50:44 2013 CET SMART support is: Available - device has SMART
> capability. SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION === SMART overall-health
> self-assessment test result: PASSED
> 
> General SMART Values: Offline data collection status:  (0x00)
> Offline data collection activity was never started. Auto Offline
> Data Collection: Disabled. Self-test execution status:      (  41)
> The self-test routine was interrupted by the host with a hard or
> soft reset. Total time to complete Offline data collection:
> ( 5746) seconds. Offline data collection capabilities:
> (0x5b) SMART execute Offline immediate. Auto Offline data
> collection on/off supp ort. Suspend Offline collection upon new 
> command. Offline surface scan supported. Self-test supported. No
> Conveyance Self-test supported. Selective Self-test supported. 
> SMART capabilities:            (0x0003) Saves SMART data before
> entering power-saving mode. Supports SMART auto save timer. Error
> logging capability:        (0x01) Error logging supported. General
> Purpose Logging supported. Short self-test routine recommended
> polling time:        (   2) minutes. Extended self-test routine 
> recommended polling time:        (  97) minutes. SCT capabilities:
> (0x003f) SCT Status supported. SCT Error Recovery Control
> supported. SCT Feature Control supported. SCT Data Table
> supported.
> 
> SMART Attributes Data Structure revision number: 16 Vendor Specific
> SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME          FLAG
> VALUE WORST THRESH TYPE UPDATED  WHEN_ FAILED RAW_VALUE 1
> Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail Always
> - 1 3 Spin_Up_Time            0x0007   100   100   015    Pre-fail 
> Always       - 5696 4 Start_Stop_Count        0x0032   100   100
> 000    Old_age Always       - 21 5 Reallocated_Sector_Ct   0x0033
> 253   253   010    Pre-fail Always       - 0 7 Seek_Error_Rate
> 0x000f   253   253   051    Pre-fail Always       - 0 8
> Seek_Time_Performance   0x0025   253   253   015    Pre-fail 
> Offline      - 0 9 Power_On_Hours          0x0032   100   100   000
> Old_age Always       - 6877 10 Spin_Retry_Count        0x0033   253
> 253   051    Pre-fail Always       - 0 11 Calibration_Retry_Count
> 0x0012   253   253   000    Old_age Always       - 0 12
> Power_Cycle_Count       0x0032   100   100   000    Old_age Always
> - 21 187 Reported_Uncorrect      0x0032   253   253   000
> Old_age Always       - 0 188 Command_Timeout         0x0032   253
> 253   000    Old_age Always       - 0 190 Airflow_Temperature_Cel
> 0x0022   064   060   000    Old_age Always       - 36 194
> Temperature_Celsius     0x0022   130   118   000    Old_age Always
> - 36 195 Hardware_ECC_Recovered  0x001a   100   100   000
> Old_age Always       - 461965997 196 Reallocated_Event_Count 0x0032
> 253   253   000    Old_age Always       - 0 197
> Total_Pending_Sectors   0x0012   253   253   000    Old_age Always
> - 0 198 Offline_Uncorrectable   0x0030   253   253   000
> Old_age Offline      - 0 199 UDMA_CRC_Error_Count    0x003e   200
> 200   000    Old_age Always       - 0 200 Multi_Zone_Error_Rate
> 0x000a   100   100   000    Old_age Always       - 0 201
> Soft_Read_Error_Rate    0x000a   100   100   000    Old_age Always
> - 0 202 Data_Address_Mark_Errs  0x0032   253   253   000
> Old_age Always       - 0
> 
> SMART Error Log Version: 1 No Errors Logged
> 
> SMART Self-test log structure revision number 1 Num
> Test_Description    Status                  Remaining 
> LifeTime(hours)  LBA _of_first_error # 1  Extended offline
> Interrupted (host reset)      90%      6876         - # 2  Extended
> offline    Interrupted (host reset)      90%      6869         - #
> 3  Extended offline    Interrupted (host reset)      90%      6868
> - # 4  Extended offline    Completed without error       00%
> 6499         -
> 
> Note: selective self-test log revision number (0) not 1 implies
> that no selectiv e self-test has ever been run SMART Selective
> self-test log data structure revision number 0 Note: revision
> number not 1 implies that no selective self-test has ever been ru 
> n SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS 1        0        0
> Not_testing 2        0        0  Not_testing 3        0        0
> Not_testing 4        0        0  Not_testing 5        0        0
> Not_testing Selective self-test flags (0x0): After scanning
> selected spans, do NOT read-scan remainder of disk. If Selective
> self-test is pending on power-up, resume after 0 minute delay.
> 
> _______________________________________________ Xen-users mailing
> list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
> 

- -- 
Tony Lill, OCT,                    Tony.Lill@xxxxxxxxxxxxxxxxxxx
President, A. J. Lill Consultants                 (519) 650 0660
539 Grand Valley Dr., Cambridge, Ont. N3H 2S2     (519) 241 2461
- --------------- http://www.ajlc.waterloo.on.ca/ ----------------


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlExWz8ACgkQGS8yZq1uvxA6RACePHzUrdXxFElp2IllVxvx86ej
3IEAn1CNRtuV5Dv6oBwPtK5j7VHopdOl
=HrpZ
-----END PGP SIGNATURE-----

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.