[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success
I can't remember how it's all symlinked, but I normally find it under somewhere like: /sys/devices/pci0000:00/0000:00:03.0/0000:0b:00.0/0000:0c:02.0/0000:0d:00.0/reset(the path reflects PCI bridges along the way - yes, I have a card behind 3 PCIe bridges on my motherboard (5520->NF200->NF200->GPU) - and that's not even the GTX690 - that would add at least one more bridge to the path - madness)If nvidia driver isn't exposing it, you could try unloading the nvidia driver, loading the nouveau driver (make sure mode switching is disabled so it doesn't get bound into a non-loadable state by the console), issuing a reset (if that exposes a reset node, which IIRC it does no Fermi+ GPUs), unloading nouveau, and reloading nvidia.ko. Then see if it works after that. GordanOn Tue, 19 Nov 2013 14:22:48 +0100, Tamas Lengyel <tamas.lengyel@xxxxxxxxxxxx> wrote: I don't see reset unfortunately: ls /sys/module/nvidia/drivers/pci:nvidia/0000:00:04.0 boot_vga Âd3cold_allowed Âenable i2c-3 msi_bus  Ârescan resource3   subsystem_device broken_parity_status Âdevice Âfirmware_node Âirq msi_irqs  resource resource3_wc Âsubsystem_vendor class Âdma_mask_bits  i2c-0 local_cpulist numa_node Âresource0 resource5   uevent config Âdriver Âi2c-1 local_cpus power  resource1 rom    vendor consistent_dma_mask_bits Âdrm Âi2c-2 modalias remove  resource1_wc Âsubsystem On Tue, Nov 19, 2013 at 11:32 AM, Gordan Bobic wrote: Does the nvidia binary driver provide a reset handle for the device via sysfs? If you echo 1 into it, does it help or does it crash things? On Tue, 19 Nov 2013 10:32:31 +0100, Tamas Lengyel wrote: Hi everyone, after following in the footsteps of the following discussionÂ(http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html[3] [1]) ÂI had been able to turn my GTX 480 into a Quadro 6000. When I VT-d passthrough it to a Debian jessie VM it shows up fine and CUDA 5.5 Âseems to function properly up to a point: lspci -v:00:04.0 VGA compatible controller: NVIDIA Corporation GF100GL [QuadroÂ6000] (rev a3) (prog-if 00 [VGA controller]) Subsystem: ASUSTeK Computer Inc. Device 075f ÂPhysical Slot: 4 Flags: bus master, fast devsel, latency 0, IRQ 32 ÂMemory at ee000000 (32-bit, non-prefetchable) [size=32M] Memory at e0000000 (64-bit, prefetchable) [size=128M] ÂMemory at e8000000 (64-bit, prefetchable) [size=64M] I/O ports at c100 [size=128] ÂExpansion ROM at f1000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 ÂCapabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 ÂCapabilities: [b4] Vendor Specific Information: Len=14 Kernel driver in use: nvidia 00:05.0 Audio device: NVIDIA Corporation GF100 High Definition Audio ÂController (rev a1) Subsystem: ASUSTeK Computer Inc. Device 075f ÂPhysical Slot: 5 Flags: bus master, fast devsel, latency 0, IRQ 37 ÂMemory at f1080000 (32-bit, non-prefetchable) [size=16K] Capabilities: [60] Power Management version 3 ÂCapabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 ÂKernel driver in use: snd_hda_intel NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery Â./deviceQuery Starting... ÂCUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) ÂDevice 0: "Quadro 6000"  CUDA Driver Version / Runtime Version     Â6.0 / 5.5  CUDA Capability Major/Minor version number:  Â2.0  Total amount of global memory:         1536 MBytes (1610285056 bytes)  (15) Multiprocessors, ( 32) CUDA Cores/MP:   480 CUDA Cores  GPU Clock rate:                Â1401 MHz (1.40 GHz)  Memory Clock rate:               1848 Mhz  Memory Bus Width:               Â384-bit  L2 Cache Size:                 786432 bytes  Maximum Texture Dimension Size (x,y,z)     1D=(65536), Â2D=(65536, 65535), 3D=(2048, 2048, 2048)  Maximum Layered 1D Texture Size, (num) layers Â1D=(16384), 2048 layers  Maximum Layered 2D Texture Size, (num) layers Â2D=(16384, 16384), 2048 layers  Total amount of constant memory:        65536 bytes  Total amount of shared memory per block:    49152 bytes  Total number of registers available per block: 32768  Warp size:                   32  Maximum number of threads per multiprocessor: Â1536  Maximum number of threads per block:      1024  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)  Max dimension size of a grid size  Â(x,y,z): (65535, 65535, 65535)  Maximum memory pitch:             Â2147483647 [4] bytes  Texture alignment:               512 bytes  Concurrent copy and kernel execution:     ÂYes with 2 copy engine(s)  Run time limit on kernels:           No  Integrated GPU sharing Host Memory:      ÂNo  Support host page-locked memory mapping:    Yes  Alignment requirement for Surfaces:      ÂYes  Device has ECC support:            ÂDisabled  Device supports Unified Addressing (UVA):   ÂYes  Device PCI Bus ID / PCI location ID:      0 / 4  Compute Mode:   Â< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA ÂRuntime Version = 5.5, NumDevs = 1, Device0 = Quadro 6000 Result = PASS Unfortunately if I try to run any CUDA app or even nvidia-smi Âafterwards, I get the following errors: NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery Â./deviceQuery Starting... ÂCUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 10 -> invalid device ordinal ÂResult = FAIL # nvidia-smi ÂUnable to determine the device handle for GPU 0000:00:04.0: The NVIDIA kernel module detected an issue with GPU interrupts.Consult the Â"Common Problems" Chapter of the NVIDIA Driver README for details and steps that can be taken to resolve this issue. If I restart the VM I can run a single CUDA app again, once. It's Âstill pretty impressive to be able to do that without having to patch Xen or reboot the entire machine =)ÂIt doesn't seem to matter what CUDA app I'm running, here is matrixMul Âfor example: matrixMul# ./matrixMul Â[Matrix Multiply Using CUDA] - Starting... GPU Device 0: "Quadro 6000" with compute capability 2.0 MatrixA(320,320), MatrixB(640,320) Computing result using CUDA Kernel... Âdone Performance= 227.22 GFlop/s, Time= 0.577 msec, Size= 131072000 Ops, ÂWorkgroupSize= 1024 threads/block Checking computed result for correctness: Result = PASS Note: For peak performance, please refer to the matrixMulCUBLAS example. Anyhoo, does anyone have any idea what might I be able to tweak so I can Âavoid this issue? The setup clearly seems to work for the most part. My domU config: Âarch = 'x86_64' name = "debian-miner" Âbuilder = "hvm" maxmem = 512 Âmemory = 512 vcpus = 1 Âmaxcpus = 1 boot = "cd" Âpae=1 acpi = 1 Âapic = 1 hap=1 Âhpet=1 shadow_memory = 32 Âon_poweroff = "destroy" on_reboot = "restart" Âon_crash = "restart" vnc=1 Âvncunused=1 vnclisten="0.0.0.0" Âvif = [ 'type=netfront,bridge=xenbr0,mac=00:16:3e:12:c3:fa'] Âdevice_model_version="qemu-xen-traditional" Âgfx_passthru=0 xen_platform_pci=1 Âpci Â= [ '01:00.0', '01:00.1' ] pci_msitranslate = 1 Âpci_power_mgmt = 1 pci_permissive = 1 Âxen_extended_power_mgmt = 1 acpi_s3 = 1 Âacpi_s4 = 1 disk = [    Â'phy:/dev/t0vg/debian-testing,xvda,w']; And I'm running on Xen 4.3.1 with NVIDIA driver 331.20 x86_64 in the domU. Thanks and cheers! Links: ------ [1]http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html[5] Links: ------ [1] mailto:gordan@xxxxxxxxxx [2] mailto:tamas.lengyel@xxxxxxxxxxxx [3] http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html [4] http://mail.shatteredsilicon.net/tel:2147483647 [5] http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |