[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults



On Thu, Nov 22, 2018 at 02:48:07PM +0200, Razvan Cojocaru wrote:
> On 11/22/18 12:58 PM, Roger Pau Monné wrote:
> > On Thu, Nov 22, 2018 at 12:14:59PM +0200, Razvan Cojocaru wrote:
> >> On 11/22/18 12:05 PM, Roger Pau Monné wrote:
> >>> On Wed, Nov 21, 2018 at 08:55:48PM +0200, Razvan Cojocaru wrote:
> >>>> On 11/16/18 7:04 PM, Roger Pau Monné wrote:
> >>>>>> +            if ( a == v )
> >>>>>> +                continue;
> >>>>>> +
> >>>>>> +            /* Pause, synced. */
> >>>>>> +            while ( !a->arch.in_host )
> >>>>> Why not use a->is_running as a way to know whether the vCPU is
> >>>>> running?
> >>>>>
> >>>>> I think the logic of using vcpu_pause and expecting the running vcpu
> >>>>> to take a vmexit and thus set in_host is wrong because a vcpu that
> >>>>> wasn't running when vcpu_pause_nosync is called won't get scheduled
> >>>>> anymore, thus not taking a vmexit and this function will lockup.
> >>>>>
> >>>>> I don't think you need the in_host boolean at all.
> >>>>>
> >>>>>> +                cpu_relax();
> >>>>> Is this really better than using vcpu_pause?
> >>>>>
> >>>>> I assume this is done to avoid waiting on each vcpu, and instead doing
> >>>>> it here likely means less wait time?
> >>>>
> >>>> The problem with plain vcpu_pause() is that we weren't able to use it,
> >>>> for the same reason (which remains unclear as of yet) that we couldn't
> >>>> use a->is_running: we get CPU stuck hypervisor crashes that way. Here's
> >>>> one that uses the same logic, but loops on a->is_running instead of
> >>>> !a->arch.in_host:
> > 
> > [...]
> > 
> >>>> Some scheduler magic appears to happen here where it is unclear why
> >>>> is_running doesn't seem to end up being 0 as expected in our case. We'll
> >>>> keep digging.
> >>>
> >>> There seems to be some kind of deadlock between
> >>> vmx_start_reexecute_instruction and hap_track_dirty_vram/handle_mmio.
> >>> Are you holding a lock while trying to put the other vcpus to sleep?
> >>
> >> d->arch.rexec_lock, but I don't see how that would matter in this case.
> > 
> > The trace from pCPU#0:
> > 
> > (XEN) [ 3668.016989] RFLAGS: 0000000000000202   CONTEXT: hypervisor (d0v0)
> > [...]
> > (XEN) [ 3668.275417] Xen call trace:
> > (XEN) [ 3668.278714]    [<ffff82d0801327d2>] vcpu_sleep_sync+0x40/0x71
> > (XEN) [ 3668.284952]    [<ffff82d08010735b>] 
> > domain.c#do_domain_pause+0x33/0x4f
> > (XEN) [ 3668.291973]    [<ffff82d08010879a>] domain_pause+0x25/0x27
> > (XEN) [ 3668.297952]    [<ffff82d080245e69>] 
> > hap_track_dirty_vram+0x2c1/0x4a7
> > (XEN) [ 3668.304797]    [<ffff82d0801dd8f5>] do_hvm_op+0x18be/0x2b58
> > (XEN) [ 3668.310864]    [<ffff82d080172aca>] pv_hypercall+0x1e5/0x402
> > (XEN) [ 3668.317017]    [<ffff82d080250899>] entry.o#test_all_events+0/0x3d
> > 
> > Shows there's an hypercall executed from Dom0 that's trying to pause
> > the domain, thus pausing all the vCPUs.
> > 
> > Then pCPU#3:
> > 
> > (XEN) [ 3669.062841] RFLAGS: 0000000000000202   CONTEXT: hypervisor (d1v0)
> > [...]
> > (XEN) [ 3669.322832] Xen call trace:
> > (XEN) [ 3669.326128]    [<ffff82d08021006a>] 
> > vmx_start_reexecute_instruction+0x107/0x68a
> > (XEN) [ 3669.333925]    [<ffff82d080210b3e>] 
> > p2m_mem_access_check+0x551/0x64d
> > (XEN) [ 3669.340774]    [<ffff82d0801dee9e>] 
> > hvm_hap_nested_page_fault+0x2f2/0x631
> > (XEN) [ 3669.348051]    [<ffff82d080202c00>] 
> > vmx_vmexit_handler+0x156c/0x1e45
> > (XEN) [ 3669.354899]    [<ffff82d08020820c>] 
> > vmx_asm_vmexit_handler+0xec/0x250
> > 
> > Seems to be blocked in vmx_start_reexecute_instruction, and thus not
> > getting paused and triggering the watchdog on pCPU#0?
> > 
> > You should check on which vCPU is the trace from pCPU#0 waiting, if
> > that's the vCPU running on pCPU#3 (d1v0) you will have to check what's
> > taking such a long time in vmx_start_reexecute_instruction.
> 
> Right, so this is what appears to be happening, if the output of my test
> is to be trusted: https://pastebin.com/YEDqNuwh
> 
> 1. vmx_start_reexecute_instruction() pauses all VCPUs but self (which
> appears to be VCPU 1):
> 
> (XEN) [  195.427141] 0 pause_count 0
> (XEN) [  195.427142] 2 pause_count 0
> (XEN) [  195.427143] 3 pause_count 0
> (XEN) [  195.427144] 4 pause_count 0
> (XEN) [  195.427146] 5 pause_count 0
> (XEN) [  195.427147] 6 pause_count 0
> (XEN) [  195.427148] 7 pause_count 0

The diff below doesn't show where you add this message, neither
what's actually printing. I guess the first number is the vCPU ID, and
the second the value of pause_count at some point?

> 
> 2. The hypercall happens, which calls domain_pause(), which I've
> modified thus:
> 
> @@ -959,7 +961,10 @@ static void do_domain_pause(struct domain *d,
>      atomic_inc(&d->pause_count);
> 
>      for_each_vcpu( d, v )
> +    {
> +        printk("domain_pause %d\n", v->vcpu_id);

Could you print both the domain and the vcpu ids?

>          sleep_fn(v);
> +    }
> 
>      arch_domain_pause(d);
>  }
> 
> and which says:
> 
> (XEN) [  195.492064] domain_pause 0

This is the hypercall code waiting for domain 1 vCPU 0 to pause?

> 
> 3. At this point, according to addr2line,
> vmx_start_reexecute_instruction() does "while ( a->is_running )
> cpu_relax();" for all VCPUs but itself.

Why don't you just start by using:

for_each_vcpu( d, v )
    if ( v != current )
        vcpu_pause(v);

Instead of open-coding it in vmx_start_reexecute_instruction.

> Now, d1v0, which, if I'm reading this correctly, is the VCPU that
> domain_pause() is stuck waiting for, does:
> 
> (XEN) [  200.829874] Xen call trace:
> (XEN) [  200.833166]    [<ffff82d0801278c6>]
> queue_read_lock_slowpath+0x25/0x4d
> (XEN) [  200.840186]    [<ffff82d08020c1f6>]
> get_page_from_gfn_p2m+0x14e/0x3b0
> (XEN) [  200.847121]    [<ffff82d080247213>]
> hap_p2m_ga_to_gfn_4_levels+0x48/0x299
> (XEN) [  200.854400]    [<ffff82d080247480>]
> hap_gva_to_gfn_4_levels+0x1c/0x1e
> (XEN) [  200.861331]    [<ffff82d08021275c>] paging_gva_to_gfn+0x10e/0x11d
> (XEN) [  200.867918]    [<ffff82d0801d66e0>] hvm.c#__hvm_copy+0x98/0x37f
> (XEN) [  200.874329]    [<ffff82d0801d848d>]
> hvm_fetch_from_guest_virt_nofault+0x14/0x16
> (XEN) [  200.882130]    [<ffff82d0801d141a>]
> emulate.c#_hvm_emulate_one+0x118/0x2bc
> (XEN) [  200.889496]    [<ffff82d0801d16b4>] hvm_emulate_one+0x10/0x12
> (XEN) [  200.895735]    [<ffff82d0801e0902>] handle_mmio+0x52/0xc9
> (XEN) [  200.901626]    [<ffff82d0801e09ba>]
> handle_mmio_with_translation+0x41/0x43
> (XEN) [  200.908994]    [<ffff82d0801ded1f>]
> hvm_hap_nested_page_fault+0x133/0x631
> (XEN) [  200.916271]    [<ffff82d080202c40>]
> vmx_vmexit_handler+0x156c/0x1e45
> (XEN) [  200.923117]    [<ffff82d08020824c>]
> vmx_asm_vmexit_handler+0xec/0x250

What lock is it waiting on? Is this the paging lock? If so you will
have to figure out who is holding this lock.

Is this on top of plain staging, or do you have other changes applied
to Xen?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.