|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Some trouble to use NVIDIA CUDA with Xen
Hello. On Thu, 15 Aug 2013, Konrad Rzeszutek Wilk wrote: http://xenbits.xen.org/gitweb/?p=xentesttools/bootstrap.git;a=blob;f=root_image/drivers/wb_to_wc/wb_to_wc.c;h=cd2439ac103150229f14f732a9a7a271ca6f397e;hb=HEAD to double check that it is working correctly).I will try @weekend.
I tried. I have NOT solution but questions exist @ END.
====================================================
Testing:
- enable verbose debugging in nvidia module ("make clean module DEFINES='-DDEBUG
-DNV_MEM_LOGGER -DNV_DBG_MEM'" + "os-interface.c:cur_debuglevel = 0x0")
- added some more debug strings (additional tag "MX")
- i attached debug output ("demsg | grep NVRM > out.txt")
- tested program CUDA 5.5 "bandwidthTest", nvdriver 319.37, linux
3.9.11-200.PAT1.fc18.x86_64, xen 4.2.2, GTX770 on pci2:0.0
- i loaded module wb_to_wc.ko but it does not help much
====================================================
Observation:
1) nv-xen.h - functions never called (function probably for DomU)
2) memory debugging shows that UC mark and WB unmark pairs works OK
- look @ out.txt ("egrep 'nv_alloc_pages:2481.*flags =
0x000[12]|nv_free_pages:2510.*flags = 0x000[12]' out.txt")
- search "nv_alloc_pages" and "cache_type" 1 (NV_MEMORY_UNCACHED) or 2
(NV_MEMORY_WRITECOMBINED)
- calling set_memory_array_uc() (==MX_AR_UC tag) set_memory_uc() (==MX_UC
tag)
- correspoding flags are set @ page structure - see "flags" (struct page)
in page_table dump
- search ""nv_free_pages" and "flags = 0x0001xxxx" / "flags = 0x0002xxxx"
(nvidia page)
- calling set_memory_array_wb() (==MX_AR_WB tag) set_memory_wb() (==MX_WB
tag)
- correspoding flags are cleared @ page structure (see page_table before
and after *_WB tag)
====================================================
Oddness:
1) why is it requested "NV_MEMORY_WRITECOMBINED" it is alocated as
set_memory*_uc() and NOT set_memory*_wc() (NV_MEMORY_WRITECOMBINED and
NV_MEMORY_UNCACHED allocated as UC) ?
(for example timestamp [ 4659.741768] in out.txt)
code (nv-vm.c:nv_alloc_system_pages()):
----
if (!NV_ALLOC_MAPPING_CACHED(at->flags))
nv_set_memory_type(at, NV_MEMORY_UNCACHED);
---
2) why is the allocated block by "1)" (eg. NV_MEMORY_WRITECOMBINED but
it is flagged set_memory*_uc()) flagged as WC in nv-mmap.c:nv_kern_mmap() ?
"vm_page_prot" is encoded MANUALLY in nv-mmap.c:nv_encode_caching() !
(for example timestamp [ 4659.902599] in out.txt)
code nv-mmap.c:nv_encode_caching():
---
switch (cache_type)
{
case NV_MEMORY_WRITECOMBINED:
if ((nv_pat_mode != NV_PAT_MODE_DISABLED) &&
(memory_type != NV_MEMORY_TYPE_REGISTERS))
{
pgprot_val(*prot) &= ~(_PAGE_PSE | _PAGE_PCD | _PAGE_PWT);
*prot = __pgprot(pgprot_val(*prot) | _PAGE_PWT);
break;
}
---
code nv-mmap.c:nv_kern_mmap():
---
for (j = i; j < (i + pages); j++)
{
nv_verify_page_mappings(at->page_table[j],
NV_ALLOC_MAPPING(at->flags));
if (NV_REMAP_PAGE_RANGE(start, at->page_table[j]->phys_addr,
PAGE_SIZE, vma->vm_page_prot))
{
NV_ATOMIC_DEC(at->usage_count);
status = -EAGAIN;
goto done;
}
start += PAGE_SIZE;
}
---
(NV_REMAP_PAGE_RANGE() == remap_pfn_range())
====================================================
Questions:
1) Is it problem when the same pages is in kernel flagged as UC and mmaped
to userspace as WC ?
2) Is it ok to manually encode WC in "remap_pfn_range()" (is it remapped to real XEN aware PTE later ?xen_pte_val?) ? Manually WC encoded as "_PAGE_PWT" eg. select entry PAT1 in non-xen kernel mapped to memory type "01H" == "Write Combining (WC)" BUT in xen kernel is PAT1 mapped to "04H" == "Write Through (WT)".Xen kernel should use "_PAGE_PAT" eg. select entry PAT4 mapped to memory type "01H" (xen rdmsr 0x277 == 50100070406). (Intel64 and IA-32 Architectures Software Developerʼs Manual Volume 3A: System Programming Guide, Part 1/chapter 11.12 PAGE ATTRIBUTE TABLE (PAT)). ==================================================== Problem still persists: If I used CUDA the system becomes unstable and sometimes crashes. [17037.717699] systemd-udevd[9160]: segfault at 18 ip 00007ff415c126d3 sp 00007fff742bfa50 error 4 in libc-2.16.so[7ff415b57000+1ad000] [17037.863424] BUG: Bad rss-counter state mm:ffff880071b15180 idx:1 val:10 [17040.876791] systemd-udevd[9161]: segfault at 3f21200ed0 ip 0000003f21200ed0 sp 00007fff742bf968 error 14 in libnss_files-2.16.so[7ff4144d0000+c000] [17040.898748] BUG: Bad rss-counter state mm:ffff880071b17100 idx:1 val:6 [17047.662793] bash[9191]: segfault at 10 ip 0000003f20e7d0dd sp 00007fff1ebd95d0 error 4 in libc-2.16.so[3f20e00000+1ad000] [17047.821840] BUG: Bad rss-counter state mm:ffff880053cbb800 idx:1 val:487 ==================================================== Thanks for answers, Martin Cerveny Attachment:
out.txt _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |