[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 2/5] xen: make evtchn_alloc_unbound public



On Fri, Mar 25, 2022 at 11:46 AM Daniel P. Smith <dpsmith.dev@xxxxxxxxx> wrote:
>
> On 3/24/22 20:30, Stefano Stabellini wrote:
> > On Wed, 23 Mar 2022, Jan Beulich wrote:
> >> On 23.03.2022 01:22, Stefano Stabellini wrote:
> >>> The existing XSM check in evtchn_alloc_unbound cannot work and should
> >>> not work: it is based on the current domain having enough privileges to
> >>> create the event channel. In this case, we have no current domain at
> >>> all. The current domain is Xen itself.
> >>
> >> And DOM_XEN cannot be given the appropriate permission, perhaps
> >> explicitly when using a real policy and by default in dummy and SILO
> >> modes?
> >
> > The issue is that the check is based on "current", not one of the
> > domains passed as an argument to evtchn_alloc_unbound. Otherwise,
> > passing DOMID_XEN would be straightforward.
> >
> > We would need to use a hack (like Daniel wrote in the other email) to
> > set the idle_domain temporarily as a privileged domain only for the sake
> > of passing an irrelevant (irrelevant as in "not relevant to this case")
> > XSM check. That cannot be an improvement. Also, setting current to a
> > "fake" domain is not great either.
>
> My suggestion was not to intended to be simply a hack but looking at the
> larger issue instead of simply doing a targeted fix for this one
> instnace. While I cannot give an example right off hand, the reality is,
> at least for hyperlaunch, that we cannot say for certain there will not
> be further resource allocations that is protected by the security
> framework and will require preliminary handling by the construction
> logic in the hypervisor. The low-complexity approach is to address each
> one in a case-by-case fashion using direct calls that go around the
> security framework. A more security conscience, and higher complexity,
> approach would be to consider a least-privilege approach and look at
> introducing the ability to do controlled switching of contexts, i.e.
> moving `current` from DOMID_IDLE to DOMID_CONSTRUCT, to one that is
> granted only the necessary privileges to do the resource allocations in
> which it is limited.
>
> This is also not the first time this issue has come up, I don't recall
> the exact thread but several months ago someone ran into the issue they
> need to make a call to a resource function and was blocked by XSM
> because DOMID_IDLE has no privileges. The reality is that the idea of
> monolithic high-privileged entities is being dropped in favor of
> least-privilege, and where possible hardware enforced, constraint. This
> can be seen with Intel de-privileging SMM and running SMI handlers in
> constrained ring 3. Arm is gaining capability pointers, CHERI, that will
> enable the possibility for constrained, least-privileged kernel
> subsystems. Would it not be advantageous for Xen to start moving in such
> a direction that would enable it to provide a new level of safety and
> security for consumers of Xen?
>
> Coming back to the short-term, I would advocate for introducing the
> concept and abstraction of constrained context switching through a set
> of function calls, which would likely be under XSM to allow policy
> enforcement. Likely the introductory implementation would just mask the
> fact that it is just setting `is_privileged` for DOMID_IDLE. Future
> evolution of the capability could see the introduction of new
> "contexts", whether they are represented by a domain could be determined
> then, and the ability to do controlled switching based on policy.

For the specific case of evtchn_alloc_unbound, Flask's
xsm_evtchn_unbound has a side effect of labeling the event channel.
So skipping the hook will have unintended consequences for Flask.

xsm_evtchn_unbound could be split in two to have an access piece and a
labeling piece.  The access piece is run at hypercall entry, and the
labeling is still done in evtchn_alloc_unbound.  For Flask, labeling
depends on the two domain endpoints, but not current.

More generally, it seems to me there are too many xsm checks in the
middle of functions/operations.  They are fine for a normal entry via
hypercall, but they interfere with Xen's internal operations.  Xen
shouldn't be restricted in its own operations.  The live update people
hit it with domain creation, and I just posted a patch for
unmap_domain_pirq.

It would be more obvious for auditing if each hypercall entrypoint
applied xsm checks.  Make the allow/deny decision as early as
possible.  Then a worker function would be easily callable for the
Xen-internal case.  The flip side of that is the xsm hook may need
sub-op specific data to make its decision, so it fits to put it in the
sub-op function.  It seems to me the location of hooks was determined
by where the data they need is already available.  Re-arranging hooks
may require some duplication.

The xsm controls should clearly apply to the DomUs and other entities
managed by Xen.  xsm restricting Xen itself seems wrong.  Having
internal operations get denied by xsm may unintentionally subvert Xen
itself.

Yes, moving checks outward makes for an un-restricted middle.  But the
security server running in the same address space isn't going to help
against code exec inside the hypervisor.

I think I view it like this:  Why is a xen internal operation subject
to a security check?  When would one ever want to deny a xen internal
operation?

Regards,
Jason



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.