[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Some trouble to use NVIDIA CUDA with Xen
Hello. On Thu, 15 Aug 2013, Konrad Rzeszutek Wilk wrote: http://xenbits.xen.org/gitweb/?p=xentesttools/bootstrap.git;a=blob;f=root_image/drivers/wb_to_wc/wb_to_wc.c;h=cd2439ac103150229f14f732a9a7a271ca6f397e;hb=HEAD to double check that it is working correctly).I will try @weekend. I tried. I have NOT solution but questions exist @ END. ==================================================== Testing: - enable verbose debugging in nvidia module ("make clean module DEFINES='-DDEBUG -DNV_MEM_LOGGER -DNV_DBG_MEM'" + "os-interface.c:cur_debuglevel = 0x0") - added some more debug strings (additional tag "MX") - i attached debug output ("demsg | grep NVRM > out.txt") - tested program CUDA 5.5 "bandwidthTest", nvdriver 319.37, linux 3.9.11-200.PAT1.fc18.x86_64, xen 4.2.2, GTX770 on pci2:0.0 - i loaded module wb_to_wc.ko but it does not help much ==================================================== Observation: 1) nv-xen.h - functions never called (function probably for DomU) 2) memory debugging shows that UC mark and WB unmark pairs works OK - look @ out.txt ("egrep 'nv_alloc_pages:2481.*flags = 0x000[12]|nv_free_pages:2510.*flags = 0x000[12]' out.txt") - search "nv_alloc_pages" and "cache_type" 1 (NV_MEMORY_UNCACHED) or 2 (NV_MEMORY_WRITECOMBINED) - calling set_memory_array_uc() (==MX_AR_UC tag) set_memory_uc() (==MX_UC tag) - correspoding flags are set @ page structure - see "flags" (struct page) in page_table dump - search ""nv_free_pages" and "flags = 0x0001xxxx" / "flags = 0x0002xxxx" (nvidia page) - calling set_memory_array_wb() (==MX_AR_WB tag) set_memory_wb() (==MX_WB tag) - correspoding flags are cleared @ page structure (see page_table before and after *_WB tag) ==================================================== Oddness:1) why is it requested "NV_MEMORY_WRITECOMBINED" it is alocated as set_memory*_uc() and NOT set_memory*_wc() (NV_MEMORY_WRITECOMBINED and NV_MEMORY_UNCACHED allocated as UC) ? (for example timestamp [ 4659.741768] in out.txt) code (nv-vm.c:nv_alloc_system_pages()): ---- if (!NV_ALLOC_MAPPING_CACHED(at->flags)) nv_set_memory_type(at, NV_MEMORY_UNCACHED); --- 2) why is the allocated block by "1)" (eg. NV_MEMORY_WRITECOMBINED but it is flagged set_memory*_uc()) flagged as WC in nv-mmap.c:nv_kern_mmap() ? "vm_page_prot" is encoded MANUALLY in nv-mmap.c:nv_encode_caching() ! (for example timestamp [ 4659.902599] in out.txt) code nv-mmap.c:nv_encode_caching(): --- switch (cache_type) { case NV_MEMORY_WRITECOMBINED: if ((nv_pat_mode != NV_PAT_MODE_DISABLED) && (memory_type != NV_MEMORY_TYPE_REGISTERS)) { pgprot_val(*prot) &= ~(_PAGE_PSE | _PAGE_PCD | _PAGE_PWT); *prot = __pgprot(pgprot_val(*prot) | _PAGE_PWT); break; } --- code nv-mmap.c:nv_kern_mmap(): --- for (j = i; j < (i + pages); j++) { nv_verify_page_mappings(at->page_table[j], NV_ALLOC_MAPPING(at->flags)); if (NV_REMAP_PAGE_RANGE(start, at->page_table[j]->phys_addr, PAGE_SIZE, vma->vm_page_prot)) { NV_ATOMIC_DEC(at->usage_count); status = -EAGAIN; goto done; } start += PAGE_SIZE; } --- (NV_REMAP_PAGE_RANGE() == remap_pfn_range()) ==================================================== Questions:1) Is it problem when the same pages is in kernel flagged as UC and mmaped to userspace as WC ? 2) Is it ok to manually encode WC in "remap_pfn_range()" (is it remapped to real XEN aware PTE later ?xen_pte_val?) ? Manually WC encoded as "_PAGE_PWT" eg. select entry PAT1 in non-xen kernel mapped to memory type "01H" == "Write Combining (WC)" BUT in xen kernel is PAT1 mapped to "04H" == "Write Through (WT)".Xen kernel should use "_PAGE_PAT" eg. select entry PAT4 mapped to memory type "01H" (xen rdmsr 0x277 == 50100070406). (Intel64 and IA-32 Architectures Software Developerʼs Manual Volume 3A: System Programming Guide, Part 1/chapter 11.12 PAGE ATTRIBUTE TABLE (PAT)). ==================================================== Problem still persists: If I used CUDA the system becomes unstable and sometimes crashes. [17037.717699] systemd-udevd[9160]: segfault at 18 ip 00007ff415c126d3 sp 00007fff742bfa50 error 4 in libc-2.16.so[7ff415b57000+1ad000] [17037.863424] BUG: Bad rss-counter state mm:ffff880071b15180 idx:1 val:10 [17040.876791] systemd-udevd[9161]: segfault at 3f21200ed0 ip 0000003f21200ed0 sp 00007fff742bf968 error 14 in libnss_files-2.16.so[7ff4144d0000+c000] [17040.898748] BUG: Bad rss-counter state mm:ffff880071b17100 idx:1 val:6 [17047.662793] bash[9191]: segfault at 10 ip 0000003f20e7d0dd sp 00007fff1ebd95d0 error 4 in libc-2.16.so[3f20e00000+1ad000] [17047.821840] BUG: Bad rss-counter state mm:ffff880053cbb800 idx:1 val:487 ==================================================== Thanks for answers, Martin Cerveny Attachment:
out.txt _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |