[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] More questions about Xen memory layout/usage, access to guest memory


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: "Johnson, Ethan" <ejohns48@xxxxxxxxxxxxxxxx>
  • Date: Thu, 22 Aug 2019 02:06:50 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=cs.rochester.edu; dmarc=pass action=none header.from=cs.rochester.edu; dkim=pass header.d=cs.rochester.edu; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=dWIuGFSykaYfUziYC1sjPKx5lSxWragNV0TwW9R9wtY=; b=Q4crcE3jtjgry7KhCjcRQiRa4N0567iuCP9Z1zmgk0hmYy+fsioS/TP1fmQhJ6AmZRU496fUg7KYAVcNeZn+uulyARlkp8GV8opM+Se3YPa4wYPLk4hIDkHC41jt2WUPmgEWmt+qNEYLHK8+kElqWJsBSM+53A4n4oQ5b7jBoIdNCocx+1emaJqLZPSKcqrEr91hAoFq4ThRACmHwJq0kREpVsogiiwyRJ555DQDa0VcsOi4765Ftwn3VZ5AYYpfKirXe1ccQjViK2XF80/IjfUPJAsROxwsMh4tYcITd59s+pBr2gUuifHnh0Xae/lu2c0oT7WK5P9agzUW2jWcMg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=E2Ahoq1D4QV6EzjO07AdkM2MummDWS1QAQP3k+h7NkQzrebohB4nVxUgQN2l9eeAWLnVcIv+ub8gdNJTNHtdsyTPmMDStfBI0KtfVyF00cUJC8AWBq/zgKd1XkTqq/cz2SgmbmybxvnO3lK8q8V4cC4BQ0lq7V+eVVFILYuNusiWicQwPzIA/toKK7pjHLmQ/lexOy1t/67TEMRLhzc6K90AII4TIGhV8Psd7EuZFptgSOS0imnE8rOaQZ97J4TG8tYEeEUExj+fRcRX+u3RUmgEBBu9928i+ktyw3i3XJvqnzykicb8ENKOYfM9HF+SfjqGwoBvWgLQqstNSPhtHQ==
  • Authentication-results: spf=none (sender IP is ) smtp.mailfrom=ejohns48@xxxxxxxxxxxxxxxx;
  • Delivery-date: Thu, 22 Aug 2019 02:06:57 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHVVGwBqHs+h+C1MEiqJCnmi4iK96b/Lh4AgAdFi4A=
  • Thread-topic: [Xen-devel] More questions about Xen memory layout/usage, access to guest memory

On 8/17/2019 7:04 AM, Andrew Cooper wrote:
>> Similarly, to what extent does the dom0 (or other such
>> privileged domain) utilize "foreign memory maps" to reach into another
>> guest's memory? I understand that this is necessary when creating a
>> guest, for live migration, and for QEMU to emulate stuff for HVM guests;
>> but for PVH, is it ever necessary for Xen or the dom0 to "forcibly"
>> access a guest's memory?
> I'm not sure what you mean by forcibly.  Dom0 has the ability to do so,
> if it chooses.  There is no "force" about it.
>
> Debuggers and/or Introspection are other reasons why dom0 might chose to
> map guest RAM, but I think you've covered the common cases.
>
>> (I ask because the research project I'm working on is seeking to protect
>> guests from a compromised hypervisor and dom0, so I need to limit
>> outside access to a guest's memory to explicitly shared pages that the
>> guest will treat as untrusted - not storing any secrets there, vetting
>> input as necessary, etc.)
> Sorry to come along with roadblocks, but how on earth do you intend to
> prevent a compromised Xen from accessing guest memory?  A compromised
> Xen can do almost anything it likes, and without recourse.  This is
> ultimately why technologies such as Intel SGX or AMD Secure Encrypted VM
> are coming along, because only the hardware itself is in a position to
> isolate an untrusted hypervisor/kernel from guest data.
>
> For dom0, that's perhaps easier.  You could reference count the number
> of foreign mappings into the domain as it is created, and refuse to
> unpause the guests vcpus until the foreign map count has dropped to 0.

We're using a technique where privileged system software (in this case, 
the hypervisor) is compiled to a virtual instruction set (based on LLVM 
IR) that limits its access to hardware features and its view of 
available memory. These limitations are/can be enforced in a variety of 
ways but the main techniques we're employing are software fault 
isolation (i.e., memory loads and stores in privileged code are 
instrumented with checks to ensure they aren't accessing forbidden 
regions), and mediation of page table updates (by modifying privileged 
software to make page table updates through a virtual instruction set 
interface, very similarly to how Xen PV guests make page table updates 
through hypercalls which gives Xen the opportunity to ensure mappings 
aren't made to protected regions).

Our technique is based on that used by the "Virtual Ghost" project (see 
https://dl.acm.org/citation.cfm?id=2541986 for the paper; direct PDF 
link: http://sva.cs.illinois.edu/pubs/VirtualGhost-ASPLOS-2014.pdf), 
which does something similar to protect applications from a compromised 
operating system kernel without relying on something like a hypervisor 
operating at a higher privileged level. We're looking to extend that 
approach to hypervisors to protect guest VMs from a compromised hypervisor.

>> Again, this mostly boils down to: under what circumstances, if ever,
>> does Xen ever "force" access to any part of a guest's memory?
>> (Particularly for PV(H). Clearly that must happen for HVM since, by
>> definition, the guest is unaware there's a hypervisor controlling its
>> world and emulating hardware behavior, and thus is in no position to
>> cooperatively/voluntarily give the hypervisor and dom0 access to its
>> memory.)
> There are cases for all guest types where Xen will need to emulate
> instructions.  Xen will access guest memory in order to perfom
> architecturally correct actions, which generally starts with reading the
> instruction under %rip.
>
> For PV guests, this almost entirely restricted to guest-kernel
> operations which are privileged in nature.  Access to MSRs, writes to
> pagetables, etc.
>
> For HVM and PVH guests, while PVH means "HVM without Qemu", it doesn't
> be a complete absence of emulation.  The Local APIC is emulated by Xen
> in most cases, as a bare minimum, but for example, the LMSW instruction
> on AMD hardware doesn't have any intercept decoding to help the
> hypervisor out when a guest uses the instruction.
>
> ~Andrew

I've found a number of files in the Xen source tree which seem to be 
related to instruction/x86 platform emulation:

arch/x86/x86_emulate.c
arch/x86/hvm/emulate.c
arch/x86/hvm/vmx/realmode.c
arch/x86/hvm/svm/emulate.c
arch/x86/pv/emulate.c
arch/x86/pv/emul-priv-op.c
arch/x86/x86_emulate/x86_emulate.c

The last of these, in particular, looks especially hairy (it seems to 
support emulation of essentially the entire x86 instruction set through 
a quite impressive edifice of switch statements).

How does all of this fit into the big picture of how Xen virtualizes the 
different types of VMs (PV/HVM/PVH)?

My impression (from reading the original "Xen and the Art of 
Virtualization" SOSP '03 paper that describes the basic architecture) 
had been that PV guests, in particular, used hypercalls in place of all 
privileged operations that the guest kernel would otherwise need to 
execute in ring 0; and that all other (unprivileged) operations could 
execute natively on the CPU without requiring emulation. From what 
you're saying (and what I'm seeing in the source code), though, it 
sounds like in reality things are a bit fuzzier - that there are some 
operations that Xen traps and emulates instead of explicitly 
paravirtualizing.

Likewise, the Xen design described in the SOSP paper discussed guest I/O 
as something that's fully paravirtualized, taking place not through 
emulation of either memory-mapped or port I/O but rather through ring 
buffers shared between the guest and dom0 via grant tables. I was a bit 
confused to find I/O emulation code under arch/x86/pv (see e.g. 
arch/x86/pv/emul-priv-op.c) that seems to be talking about "ports" and 
the like. Is this another example of things being fuzzier in reality 
than in the "theoretical" PV design? What devices, if any, are emulated 
rather than paravirtualized for a PV guest? I know that for PVH, you 
mentioned that the Local APIC is (at a minimum) emulated, along with 
some special instructions; is that true for classic PV as well?

For HVM, obviously anything that can't be virtualized natively by the 
hardware needs to be emulated by Xen/QEMU (since the guest kernel isn't 
expected to be cooperative to issue PV hypercalls instead); but I would 
expect emulation to be limited to the relatively small subset of the ISA 
that VMX/SVM can't natively virtualize. Yet I see that x86_emulate.c 
supports emulating just about everything. Under what circumstances does 
Xen actually need to put all that emulation code to use?

I'm also wondering just how much of this is Xen's responsibility vs. 
QEMU's. I understand that when QEMU is used on its own (i.e., not with 
Xen), it uses dynamic binary recompilation to handle the parts of the 
ISA that can't be virtualized natively in lower-privilege modes. Does 
Xen only use QEMU for emulating off-CPU devices (interrupt controller, 
non-paravirtualized disk/network/graphics/etc.), or does it ever employ 
any of QEMU's x86 emulation support in addition to Xen's own emulation code?

Is there any particular place in the code where I can go to get a 
comprehensive "list" (or other such summary) of which parts of the ISA 
and off-CPU system are emulated for each respective guest type (PV, HVM, 
and PVH)? I realize that the difference between HVM and PVH is more of a 
continuum than a line; what I'm especially interested in is, what's the 
*bare minimum* of emulation required for a PVH guest that's using as 
much paravirtualization as possible? (That's the setting I'm looking to 
target for my research on protecting guests from a compromised 
hypervisor, since I'm trying to minimize the scope of interactions 
between the guest and hypervisor/dom0 that our virtual instruction set 
layer needs to mediate.)//


On a somewhat related note, I also have a question about a particular 
piece of code in arch/x86/pv/emul-priv-op.c, namely the function 
io_emul_stub_setup(). It looks like it is, at runtime, crafting a 
function that switches to the guest register context, emulates a 
particular I/O operation, then switches back to the host register 
context. This caught our attention while we were implementing Control 
Flow Integrity (CFI) instrumentation for Xen (which is necessary for us 
to enforce the software fault isolation (SFI) instrumentation that 
provides our memory protections). Why does Xen use dynamically-generated 
code here? Is it just for implementation convenience (i.e., to improve 
the generalizability of the code)?

Thanks again for all your time and effort spent answering my questions. 
I know I'm throwing a lot of unusual questions out there - this 
back-and-forth has been very helpful for me in figuring out *what* 
questions I need to be asking in the first place to understand what's 
feasible to do in the Xen architecture and how I might go about doing 
it. :-)

Thanks,
Ethan Johnson

-- 
Ethan J. Johnson
Computer Science PhD student, Systems group, University of Rochester
ejohns48@xxxxxxxxxxxxxxxx
ethanjohnson@xxxxxxx
PGP public key available from public directory or on request

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.