Xen project Mailing List

Re: [PATCH v4 for-4.14] x86/monitor: revert default behavior when monitoring register write events

To: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>

From: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>

Date: Mon, 8 Jun 2020 23:14:02 +0300

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=bitdefender.com; dmarc=pass action=none header.from=bitdefender.com; dkim=pass header.d=bitdefender.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=abgcg7OQU3BBnpx7ochP8a7Qo0siGKOUYttX8rYHZWA=; b=FXygQ/4cvJ8NI+hOPiaDc7geGapbgxxTBc1GdIxmM1TfkfmkEiKzaETM+rkMa6B2qZa9VE9bVxglQIBE89ay6V4Uo4rOgedbvz+sWP+nY6anpP3cpUDXDP0FiSjyUf/ptUbemVrmbip084I01w1xU22rqymrBwxvPb0RRtyt+SGaJdZ/WtIlQRM1egP7/MonpX6D5WmuSXDrHs8pTg3weE0IydBw9X3K8kLtx9sZXwJOYKxFbNcsf/AeHYGtogqDZei7XQNcz5F5HrEr9/A5L8ddD0NSWHF25uFXnRvVGODjJUC0OgU4+sn0nULTBciTl1hWvsLILbwDW+WEFbfrNw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Y+ZeGgFAtjTDTjWHPkGnWoYGcy9lFbCdSFtCB8bXFRbgdRCEhglhSxAOHY6MFBQCiAwgFvytCRUtIS/OA0Dh7iODG4NV2y52W3b4kA8DihX2OGN21LyMPK64UbAN+Z8ShhdRiBUkHrqqjPTvVEJAmkq61KIWgq3Uo0/oLo14g7bBDKIGDQeCR7S53V58RSaCU/c569XBl/6nCqXbon+1XccvXAHQwdG6lXE2fHvBhYlF2vuRhFfoR0fh5Hzq5Xi/ZB/IQLOqSljOolZhsR4xLM7VIRCRh6cPHot/KmZy7q7fhhR9LHh98bcyhiCJf9P+LlI2m1zfSFiTUvDwG88YFg==

Authentication-results: bitdefender.com; dkim=none (message not signed) header.d=none;bitdefender.com; dmarc=none action=none header.from=bitdefender.com;

Cc: Petre Pircalabu <ppircalabu@xxxxxxxxxxxxxxx>, Andrei LUTAS <vlutas@xxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Mihai Donțu <mdontu@xxxxxxxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Alexandru Isaila <aisaila@xxxxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Tue, 09 Jun 2020 04:17:03 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 6/8/20 10:54 PM, Tamas K Lengyel wrote: > On Mon, Jun 8, 2020 at 12:58 PM Razvan Cojocaru >> And last but not least, the proper sequence is: 1. unsubscribe from >> register write events, 2. process all events "still in the chamber" >> (keep checking the ring buffer for a while), 3. detach from the guest >> (disable the vm_event subsystem). Not ideal perhaps (in that it's not >> guaranteed that a VCPU won't resume after a longer period than our >> timeout), but if the sequence is followed there should be no guest hangs >> or crashes (at least none that we or our clients have observed so far). > > Incorrect. That's not enough. You also have to wait for all the vCPUs > to get scheduled before disabling vm_event or otherwise the emulation > is skipped entirely. Please read the patch message. If the user > decides to disable vm_event after getting a CR3 event delivered, the > CR3 never gets updated and results in the guest crashing in > unpredictable ways. Same happens with all the other registers. I did read the patch message. As I've previously stated ("it's not guaranteed that a VCPU won't resume after a longer period than our timeout"), it's not ideal, and I've made no claim that the problem does not exist or that it shouldn't be fixed - but really if you've got a VCPU that will wait more than a couple of seconds to get scheduled then you've got a bigger problem with the VM. >> So in short, I think there's a better fix for this by simply not >> allocating the vm_event memory on-demand anymore and never having to >> deal with lost pending emulations again. It should also decrease code >> complexity by a tiny bit. Then again, as stated at the beginning of this >> message, that's just a recommendation.is > > Since only you guys use this feature I'm going to wait for a fix. > Until then, the default behavior should be restored so this buggy > behavior doesn't affect anyone else. You can still turn it on, its > just not going to be on for vm_event by default. I don't particularly > care what fix is there since only you guys use it. If you don't mind > that there is this buggy behavior because you never disable vm_events > once you activate it then that's that. Indeed, our usual scenario is that vm_event is always on on the machine. It's only rarely being disabled while the guest is running, and when it is, it's always with waiting sufficiently long that we've not seen any unexplainable hung VMs. Fair enough, as long as the previous behaviour is optionally available I see no reason why this patch shouldn't make it in - though, of course, as also previously stated, I'm just trying to shed as much light as possible on this from what I remember, and what happens next is not my call. My colleagues should be able to chime in tomorrow. Cheers, Razvan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.