[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success



I don't see reset unfortunately:

ls /sys/module/nvidia/drivers/pci\:nvidia/0000\:00\:04.0
boot_vga  d3cold_allowed  enable i2c-3 msi_bus    rescan resource3     subsystem_device
broken_parity_status  device  firmware_node  irq msi_irqs   resource resource3_wc  subsystem_vendor
class  dma_mask_bits   i2c-0 local_cpulist numa_node  resource0 resource5     uevent
config  driver  i2c-1 local_cpus power   resource1 rom       vendor
consistent_dma_mask_bits  drm  i2c-2 modalias remove   resource1_wc  subsystem



On Tue, Nov 19, 2013 at 11:32 AM, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
Does the nvidia binary driver provide a reset handle for the device via sysfs?
If you echo 1 into it, does it help or does it crash things?



On Tue, 19 Nov 2013 10:32:31 +0100, Tamas Lengyel <tamas.lengyel@xxxxxxxxxxxx> wrote:
Hi everyone,
after following in the footsteps of the following discussion
 (http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html
[1])

 I had been able to turn my GTX 480 into a Quadro 6000. When I VT-d
passthrough it to a Debian jessie VM it shows up fine and CUDA 5.5
 seems to function properly up to a point:

lspci -v:

00:04.0 VGA compatible controller: NVIDIA Corporation GF100GL [Quadro
 6000] (rev a3) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Device 075f
 Physical Slot: 4
Flags: bus master, fast devsel, latency 0, IRQ 32
 Memory at ee000000 (32-bit, non-prefetchable) [size=32M]
Memory at e0000000 (64-bit, prefetchable) [size=128M]
 Memory at e8000000 (64-bit, prefetchable) [size=64M]
I/O ports at c100 [size=128]
 Expansion ROM at f1000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
 Capabilities: [b4] Vendor Specific Information: Len=14
Kernel driver in use: nvidia

00:05.0 Audio device: NVIDIA Corporation GF100 High Definition Audio
 Controller (rev a1)
Subsystem: ASUSTeK Computer Inc. Device 075f
 Physical Slot: 5
Flags: bus master, fast devsel, latency 0, IRQ 37
 Memory at f1080000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
 Kernel driver in use: snd_hda_intel

NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery
 ./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

 Device 0: "Quadro 6000"
  CUDA Driver Version / Runtime Version          6.0 / 5.5
   CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 1536 MBytes
(1610285056 bytes)
   (15) Multiprocessors, ( 32) CUDA Cores/MP:     480 CUDA Cores
  GPU Clock rate:                                1401
MHz (1.40 GHz)
   Memory Clock rate:                             1848
Mhz
  Memory Bus Width:                            
 384-bit
   L2 Cache Size:                                
786432 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536),
 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048
layers
   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384),
2048 layers
  Total amount of constant memory:               65536 bytes
   Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
   Warp size:                                    
32
  Maximum number of threads per multiprocessor:  1536
   Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
   Max dimension size of a grid size    (x,y,z): (65535, 65535,
65535)
  Maximum memory pitch:                        
 2147483647 bytes
   Texture alignment:                             512
bytes
  Concurrent copy and kernel execution:          Yes with 2 copy
engine(s)
   Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
   Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
   Device has ECC support:                      
 Disabled
  Device supports Unified Addressing (UVA):      Yes
   Device PCI Bus ID / PCI location ID:           0 / 4
  Compute Mode:
      < Default (multiple host threads can use ::cudaSetDevice()
with
device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA
 Runtime Version = 5.5, NumDevs = 1, Device0 = Quadro 6000
Result = PASS

Unfortunately if I try to run any CUDA app or even nvidia-smi
 afterwards, I get the following errors:

NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery
 ./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 10
-> invalid device ordinal
 Result = FAIL

# nvidia-smi
 Unable to determine the device handle for GPU 0000:00:04.0: The
NVIDIA
kernel module detected an issue with GPU interrupts.Consult the
 "Common Problems" Chapter of the NVIDIA Driver README for
details and steps that can be taken to resolve this issue.

If I restart the VM I can run a single CUDA app again, once. It's
 still pretty impressive to be able to do that without having to patch
Xen or reboot the entire machine =) It doesn't seem to matter what
CUDA app I'm running, here is matrixMul
 for example:

matrixMul# ./matrixMul
 [Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Quadro 6000" with compute capability 2.0

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
 done
Performance= 227.22 GFlop/s, Time= 0.577 msec, Size= 131072000 Ops,
 WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

Note: For peak performance, please refer to the matrixMulCUBLAS
example.

Anyhoo, does anyone have any idea what might I be able to tweak so I
can
 avoid this issue? The setup clearly seems to work for the most
part.

My domU config:

 arch = 'x86_64'
name = "debian-miner"
 builder = "hvm"
maxmem = 512
 memory = 512
vcpus = 1
 maxcpus = 1
boot = "cd"
 pae=1
acpi = 1
 apic = 1
hap=1
 hpet=1
shadow_memory = 32
 > >  > vnc=1
 vncunused=1
vnclisten="0.0.0.0"
 vif = [ 'type=netfront,bridge=xenbr0,mac=00:16:3e:12:c3:fa']
 device_model_version="qemu-xen-traditional"
 gfx_passthru=0
xen_platform_pci=1
 pci  = [ '01:00.0', '01:00.1' ]
pci_msitranslate = 1
 pci_power_mgmt = 1
pci_permissive = 1
 xen_extended_power_mgmt = 1
acpi_s3 = 1
 acpi_s4 = 1
disk = [        'phy:/dev/t0vg/debian-testing,xvda,w'];

And I'm running on Xen 4.3.1 with NVIDIA driver 331.20 x86_64 in the
domU.

Thanks and cheers!


Links:
------
[1]

http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.