[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 for-4.14] x86/monitor: revert default behavior when monitoring register write events


  • To: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>
  • From: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>
  • Date: Mon, 8 Jun 2020 23:14:02 +0300
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=bitdefender.com; dmarc=pass action=none header.from=bitdefender.com; dkim=pass header.d=bitdefender.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=abgcg7OQU3BBnpx7ochP8a7Qo0siGKOUYttX8rYHZWA=; b=FXygQ/4cvJ8NI+hOPiaDc7geGapbgxxTBc1GdIxmM1TfkfmkEiKzaETM+rkMa6B2qZa9VE9bVxglQIBE89ay6V4Uo4rOgedbvz+sWP+nY6anpP3cpUDXDP0FiSjyUf/ptUbemVrmbip084I01w1xU22rqymrBwxvPb0RRtyt+SGaJdZ/WtIlQRM1egP7/MonpX6D5WmuSXDrHs8pTg3weE0IydBw9X3K8kLtx9sZXwJOYKxFbNcsf/AeHYGtogqDZei7XQNcz5F5HrEr9/A5L8ddD0NSWHF25uFXnRvVGODjJUC0OgU4+sn0nULTBciTl1hWvsLILbwDW+WEFbfrNw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Y+ZeGgFAtjTDTjWHPkGnWoYGcy9lFbCdSFtCB8bXFRbgdRCEhglhSxAOHY6MFBQCiAwgFvytCRUtIS/OA0Dh7iODG4NV2y52W3b4kA8DihX2OGN21LyMPK64UbAN+Z8ShhdRiBUkHrqqjPTvVEJAmkq61KIWgq3Uo0/oLo14g7bBDKIGDQeCR7S53V58RSaCU/c569XBl/6nCqXbon+1XccvXAHQwdG6lXE2fHvBhYlF2vuRhFfoR0fh5Hzq5Xi/ZB/IQLOqSljOolZhsR4xLM7VIRCRh6cPHot/KmZy7q7fhhR9LHh98bcyhiCJf9P+LlI2m1zfSFiTUvDwG88YFg==
  • Authentication-results: bitdefender.com; dkim=none (message not signed) header.d=none;bitdefender.com; dmarc=none action=none header.from=bitdefender.com;
  • Cc: Petre Pircalabu <ppircalabu@xxxxxxxxxxxxxxx>, Andrei LUTAS <vlutas@xxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Mihai Donțu <mdontu@xxxxxxxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Alexandru Isaila <aisaila@xxxxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Tue, 09 Jun 2020 04:17:03 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 6/8/20 10:54 PM, Tamas K Lengyel wrote:
> On Mon, Jun 8, 2020 at 12:58 PM Razvan Cojocaru
>> And last but not least, the proper sequence is: 1. unsubscribe from
>> register write events, 2. process all events "still in the chamber"
>> (keep checking the ring buffer for a while), 3. detach from the guest
>> (disable the vm_event subsystem). Not ideal perhaps (in that it's not
>> guaranteed that a VCPU won't resume after a longer period than our
>> timeout), but if the sequence is followed there should be no guest hangs
>> or crashes (at least none that we or our clients have observed so far).
> 
> Incorrect. That's not enough. You also have to wait for all the vCPUs
> to get scheduled before disabling vm_event or otherwise the emulation
> is skipped entirely. Please read the patch message. If the user
> decides to disable vm_event after getting a CR3 event delivered, the
> CR3 never gets updated and results in the guest crashing in
> unpredictable ways. Same happens with all the other registers.

I did read the patch message. As I've previously stated ("it's not
guaranteed that a VCPU won't resume after a longer period than our
timeout"), it's not ideal, and I've made no claim that the problem does
not exist or that it shouldn't be fixed - but really if you've got a
VCPU that will wait more than a couple of seconds to get scheduled then
you've got a bigger problem with the VM.

>> So in short, I think there's a better fix for this by simply not 
>> allocating the vm_event memory on-demand anymore and never having to
>> deal with lost pending emulations again. It should also decrease code
>> complexity by a tiny bit. Then again, as stated at the beginning of this
>> message, that's just a recommendation.is
> 
> Since only you guys use this feature I'm going to wait for a fix.
> Until then, the default behavior should be restored so this buggy
> behavior doesn't affect anyone else. You can still turn it on, its
> just not going to be on for vm_event by default. I don't particularly
> care what fix is there since only you guys use it. If you don't mind
> that there is this buggy behavior because you never disable vm_events
> once you activate it then that's that.

Indeed, our usual scenario is that vm_event is always on on the machine.
It's only rarely being disabled while the guest is running, and when it
is, it's always with waiting sufficiently long that we've not seen any
unexplainable hung VMs.

Fair enough, as long as the previous behaviour is optionally available I
see no reason why this patch shouldn't make it in - though, of course,
as also previously stated, I'm just trying to shed as much light as
possible on this from what I remember, and what happens next is not my
call. My colleagues should be able to chime in tomorrow.


Cheers,
Razvan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.