[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] How to deal with hypercalls returning -EFAULT
Currently the release of Xen 4.11 is blocked due to a sporadic failure of the OSSTEST guest-saverestore[.2]. During that test a hypercall issued by libxc via the Linux privcmd driver returns -EFAULT in spite of all hypercall buffers locked in memory via mlock() (or similar flags specified in a mmap() call). My analysis has revealed that modern Linux kernels might make such locked user pages unaccessible for very short periods of time. This can happen e.g. when pages are subject to compaction or migration. There are multiple ways to mitigate this problem: 1. Trying to switch page migration or compaction off in dom0. Pros: - no change in Xen necessary Cons: - new cases might come up in the future - easy to miss, failures are really very sporadic and might happen only after updating the kernel 2. Add a bandaid to Xen tools by retrying hypercalls which have failed with -EFAULT (either for all or only for some hypercalls) Pros: - no interface change necessary Cons: - not all hypercalls might be just repeatable - problem isn't solved but just worked around 3. Modify the interface to the privcmd driver to pass information about used buffers to the kernel in order to lock them there. Either add a new interface for hypercall buffer management or add the list of buffers to the privcmd ioctl data structure. Pros: - problem is really solved Cons: - split solution between kernel and Xen, both must be changed 4. Modify the interface between hypervisor and kernel: instead of just returning -EFAULT let the hypervisor behave more like copy_to_user by raising a page fault which can then be fixed up in the kernel. This change must be activated by the kernel, of course. Pros: - rather simple change in the kernel "doing the right thing" - hypercall bounce buffer handling in libxc/libxencall can be switched off for a kernel supporting this chnage Cons: - split solution between kernel and Xen, both must be changed - not sure how complex the required hypervisor change will be It should be noted that we can either select only one of above solutions or one of 3/4 and additionally one of 1/2 as a fallback for old kernels. How to proceed? I'd like to have an answer as fast as possible to unblock 4.11 release. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |