[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RFC: HVM de-privileged mode scheduling considerations

On 03/08/15 14:35, Ben Catterall wrote:
> Hi all,
> I am working on an x86 proof-of-concept to evaluate if it is feasible
> to move device models and x86 emulation code for HVM guests into a
> de-privileged context.
> I was hoping to get feedback from relevant maintainers on scheduling
> considerations for this system to mitigate potential DoS attacks.
> Many thanks in advance,
> Ben
> This is intended as a proof-of-concept, with the aim of determining if
> this idea is feasible within performance constraints.
> Motivation
> ----------
> The motivation for moving the device models and x86 emulation code
> into ring 3 is to mitigate a system  compromise due a bug in any of
> these systems. These systems are currently part of the hypervisor and,
> consequently, a bug in any of these could allow an attacker to gain
> control (or perform a DOS) of
> Xen and/or guests.
> Migrating between PCPUs
> -----------------------
> There is a need to support migration between pcpus so that the
> scheduler can still perform this operation. However, there is an issue
> to resolve. Currently, I have a per-vcpu copy of the Xen ring 0 stack
> up to the point of entering the de-privileged mode. This allows us to
> restore this stack and then continue from the entry point when we have
> finished in de-privileged mode. There will be per-pcpu data on these
> per-vcpu stacks such as saved stack frame pointers for the per-pcpu
> stack, smp_processor_id() responses etc.
> Therefore, it will be necessary to lock the vcpu to the current pcpu
> when it enters this user mode so that it does not wake up on a
> different pcpu where such pointers and other data are invalid. We can
> do this by setting a hard affinity to the pcpu that the vcpu is
> executing on. See common/wait.c which does something similar to what I
> am doing.
> However, needing to have hard affinity to a pcpu leads to the
> following problem:
> - An attacker could lock multiple vcpus to a single pcpu, leading to a
> DoS. This could be achieved by  spinning in a loop in Xen
> de-privileged mode (assuming a bug in this mode) and performing this
> operation on multiple vcpus at once. The attacker could wait until all
> of their vcpus were on the same pcpu and then execute this attack.
> This could cause the pcpu to, effectively, lock up, as it will be
> under heavy load, and we would be unable to move work elsewhere.
> A solution to the DoS would be to force migration to another pcpu, if
> after, say, 100 quanta have passed where the vcpu has remained in
> de-privileged mode. This forcing of migration would require us to
> forcibly complete the de-privileged operation, and then, just before
> returning into the guest, force a cpu change. We could not just force
> a migration at the schedule call point as the Xen stack needs to
> unwind to free up resources. We would reset this count each time we
> completed a de-privileged mode operation.
> A legitimate long-running de-privileged operation would trigger this
> forced migration mechanism. However, it is unlikely that such
> operations will be needed and the count can be adjusted appropriately
> to mitigate this.
> Any suggestions or feedback would be appreciated!

I don't see why any scheduling support is needed.

Currently all operations like this are run synchronously in the vmexit
context of the vcpu.  Any current DoS is already a real issue.

In any reasonable situation, emulation of a device is a small state
mutation and occasionally kicking off a further action to perform.  (The
far bigger risk from this kind of emulation is following bad
pointers/etc, rather than long loops.)

I think it would be entirely reasonable to have a deadline for a single
execution of depriv mode, after which the domain is declared malicious
and killed.

We already have this for host pcpus - the watchdog defaults to 5
seconds.  Having a similar cutoff for depriv mode should be fine.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.