[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Locking on vm-event operations (monitor)



On 07/22/2016 02:39 PM, Corneliu ZUZU wrote:
> On 7/22/2016 1:29 PM, Razvan Cojocaru wrote:
>> On 07/22/2016 01:17 PM, Corneliu ZUZU wrote:
>>> On 7/22/2016 12:55 PM, Razvan Cojocaru wrote:
>>>> On 07/22/2016 12:27 PM, Corneliu ZUZU wrote:
>>>>> Hi,
>>>>>
>>>>> I've been inspecting vm-event code parts to try and understand when
>>>>> and
>>>>> why domain pausing/locking is done. If I understood correctly, domain
>>>>> pausing is done solely to force all the vCPUs of that domain to see a
>>>>> configuration update and act upon it (e.g. in the case of a
>>>>> XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring,
>>>>> domain pausing/unpausing ensures immediate enabling of CR3
>>>>> load-exiting
>>>>> for all vCPUs), not to synchronize concurrent operations
>>>>> (lock-behavior).
>>>>>
>>>>> As for locking, I see that for example vm_event_init_domain(),
>>>>> vm_event_cleanup_domain() and monitor_domctl() are all protected by
>>>>> the
>>>>> domctl-lock, but I don't think that's enough.
>>>>> Here are a few code-paths that led me to believe that:
>>>>> * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_*
>>>>> resources, but it doesn't seem to take the domctl lock, so it seems
>>>>> possible for that to happen _while_ those resources are
>>>>> initialized/cleaned-up
>>>>> * monitor_guest_request() also calls
>>>>> monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot()
>>>>> which attempts a vm_event_ring_lock(ved), which could also be called
>>>>> _while_ that's initialized (vm_event_enable()) or cleaned-up
>>>>> (vm_event_disable())
>>>>> * hvm_monitor_cr() - e.g. on the code-path
>>>>> vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX()
>>>>>
>>>>>
>>>>> there doesn't seem to be taken into account the possibility of a
>>>>> concurrent monitor_init_domain()/monitor_cleanup_domain()
>>>>>
>>>>> Am I missing something with these conclusions?
>>>> Your conclusions look correct, but I assume that the reason why this
>>>> has
>>>> not been addressed in the past is that introspection applications are
>>>> expected to be well-behaved.
>>> I wouldn't think that was the rationale (considering that the risk is
>>> crashing the hypervisor as a whole; also see below), I'd rather think
>>> this simply went unnoticed.
>>>
>>>> Specifically, in a codebase where the
>>>> choice between uint64_t and long int matters speed-wise, and where
>>>> unlikely()s also matter, an extra lock may be an issue.
>>>>
>>>> The typical flow of an introspection application is:
>>>>
>>>> 1. Initialize everything.
>>>> 2. Subscribe to relevant events.
>>>> 3. Event processing loop.
>>>> 4. Unsubscribe from events.
>>>> 5. Do a last-run of event processing (already queued in the ring
>>>> buffer).
>>>> 6. Uninitialize everything (no events are possible here because of
>>>> steps
>>>> 4-5).
>>> And even if an introspection application behaves as sanely as you say,
>>> the current layout of the code still doesn't guarantee bug-free
>>> behavior. That's because for one, there are in-guest events (that
>>> trigger access of vm-events resources) that are completely asynchronous
>>> in relation to 2-6, for example a HVMOP_guest_request_vm_event
>>> (subsequently calling monitor_guest_request()).
>> Not true. As part of step 4 d->monitor.guest_request_enabled will become
>> 0, so no sensitive code will run after that:
> 
> I wasn't being rigorous with 2-6 above (with that I only meant to say
> 'there can't be any vm-events before everything is initialized') and it
> seems true that at least in the monitor_guest_request() case it might
> not be able to produce a hypervisor crash (although there are a number
> of ugly inconsistencies on that code path). But nonetheless a hypervisor
> crash can _still_ happen even with a _sane_ toolstack user.
> 
> Look @ hvm_do_resume():
> 
> 1. If you, as a toolstack user, get at sane step no. 6: "Uninitialize
> everything (no events are possible here because of steps 4-5)."
> 2. But just before you do that,  a hvm_do_resume() happens on an
> arbitrary vCPU of the domain and gets to this point:
> 
>     if ( unlikely(v->arch.vm_event) )
>     {
>         // ---> let's say you are now at this point, just after the
> above non-NULL check
>         struct monitor_write_data *w = &v->arch.vm_event->write_data;
> 
> 3. and before proceeding further, the uninitialization sequence
> _completes_, i.e. this was done in vm_event_cleanup_domain():
> 
>     for_each_vcpu ( d, v )
>     {
>         xfree(v->arch.vm_event);
>         v->arch.vm_event = NULL;
>     }
> 
> 4. and then in hvm_do_resume() this gets to be done:
> 
>         struct monitor_write_data *w = &v->arch.vm_event->write_data;
> 
>         [...]
> 
>         if ( w->do_write.msr )
>         {
>             hvm_msr_write_intercept(w->msr, w->value, 0);
>             w->do_write.msr = 0;
>         }
> 
> then you'll have a beautiful NULL-pointer access hypervisor crash.

As I've said before, of course I agree with that - isn't that what a
previous patch you've submitted addressed? I was merely replying to your
concerns assuming that that issue is addressed (as discussed before),
and assuming that the introspection application behaves gracefully.

Again, I'm all for your proposed changes with the additional locking. I
personally do prefer as many safeguards as possible.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.