[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] domU panic on nested call to arch_enter_lazy_mmu_mode()



On Wed, Apr 10, 2013 at 11:35:35AM -0400, Andrew Jones wrote:
> Hi all,
> 
> A couple years ago a thread[1] popped up here for a bug report that
> Jeremy followed up to with this patch[2]. That patch was never
> committed though (likely because the issue was difficult to
> reproduce/test). We've got a report now of the same issue for the
> rhel6 kernel running on EC2. It's pretty certain that it's the same,
> because the reproducer steps[3] given would certainly generate the
> same call sequences shown in [1], and applying the proposed patch[2]
> to the rhel6 kernel fixes it.
> 
> Now, while the grant table code has changed some between what rhel6
> has and recent kernels, I believe the issue should still be present
> with recent kernels. However, we attempted to reproduce using a
> Fedora18 kernel (>3.8) and could not. So I'm writing to see if I'm
> missing something in my analysis - meaning upstream is no longer at
> risk of hitting this bug, and/or if Jeremy's proposed patch was
> rejected for other reasons than not being testable (or just
> forgotten). If not, then I'd suggest we repost it.

The logic behind the arch_enter/leave_lazy_mmu was that they would
be done within the context of the kernel uninterrupted. Meaning that the
enter and leave would be done at some point and user-space would not
be invoked during that time (which is btw the issue that Chuck
spotted). There were a couple of bugs that did not do that properly and
they have been fixed (I can't remember the exact ones, but a git log
--grep="lazy" should provide some idea).

Most of the issues were not in the Xen code but in generic, such
as vmalloc, and some other ones:

commit 1160c2779b826c6f5c08e5cc542de58fd1f667d5
Author: Samu Kallio <samu.kallio@xxxxxxxxxxxxxxxxx>
Date:   Sat Mar 23 09:36:35 2013 -0400

    x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates


But if you find this re-appearing, please do report it so we can
either track it down, or use that patch (and add some WARN) so
that the customers can still use the kernel but we can identify
the issues.

> 
> Thanks,
> drew
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2010-12/msg00440.html
> [2] http://lists.xen.org/archives/html/xen-devel/2010-12/msg00505.html
> [3] Reproducer steps
> 1. Start a instance which is a c1.xlarge of Amazon EC2 Instance type.
>    (c1.xlarge has 8 cores)
> 
> 2. create 7 file systems(ext3) on top of Amazon EBS volumes 
> 
> 3. mount 7 file sytemes you created
> 
> 4. For increasing page table operations, create a following program
> 
> --
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> 
> int main(void)
> {
>         int status;
>         pid_t pid; 
>         for (;;) {
>                 pid = fork();
>                 if (pid == 0) {
>                         return 0;
>                 }
>                 wait(&status);
>         }
> }
> --
> 
> 5. run the program  pinning CPU0
> 
> # gcc fork.c
> # taskset -c 0 ./a.out  
> 
> 
> 6. For using grant table, execute simultaneous write operation to 7 EBS 
> volumes.
>   ( c1.xlarge can use 8CPU so execute simultaneous write to CPU1-CPU7 except 
> CPU0 )
> 
> For instance:
> --
> for i in `seq 1 7`;
> do
>         taskset -c $i dd if=/dev/zero of=/mnt/$i/testfile bs=10M count=10000 
> oflag=direct &
> done
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.