Xen project Mailing List

Re: [Xen-devel] [RFC] Overview of work required to implement mem_access for PV guests

To: "Aravindh Puthiyaparambil (aravindp)" <aravindp@xxxxxxxxx>, Tim Deegan <tim@xxxxxxx>

From: Andres Lagar-Cavilla <andreslc@xxxxxxxxxxxxxx>

Date: Tue, 26 Nov 2013 13:41:43 -0500

Cc: "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Tue, 26 Nov 2013 18:41:58 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> The mem_access APIs only work with HVM guests that run on Intel >> hardware with EPT support. This effort is to enable it for PV guests that run >> with shadow page tables. To facilitate this, the following will be done: >>> >>> 1. A magic page will be created for the mem_access (mem_event) ring >>> buffer during the PV domain creation. >> >> As Andrew pointed out, you might have to be careful about this -- if the page >> is owned by the domain itself, and it can find out (or guess) its MFN, it can >> map and write to it. You might need to allocate an anonymous page for this? > > Do you mean allocate an anonymous page in dom0 and use that? Won't we run in > to the problem Andres was mentioning a while back? > http://xen.markmail.org/thread/kbrz7vo3oyrvgsnc > Or were you meaning something else? > > I was planning on doing exactly what we do in the mem_access listener for HVM > guests. The magic page is mapped in and then removed from physmap of the > guest. > http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/xen-access/xen-access.c;h=b00c05aa4890ee694e8101b77cca582fff420c7b;hb=HEAD#l333 Once the page is removed from the physmap, an hvm guest has no way of indexing that page and thus mapping it -- even though it's a page that belongs to it, and that it's threaded on its list of pages owned. WIth PV, you have an additional means of indexing, which is the raw MFN. The PV guest will be able to get at the page because it owns it, if it knows the MFN. No PFN/GFN required. This is how, for example, things like the grant table are mapped in classic PV domains. I don't know how realistic is the concern about the domain guessing the MFN for the page. But if it can, and it maps it and mucks with the ring, the thing to evaluate is: can the guest throw dom0/host into a tailspin? The answer is likely "no", because guests can't reasonably do this with other rings they have access to, like PV driver backends. But a flaw on the consumer side of mem events could yield a vector for DoS. If, instead, the page is a xen-owned page (alloc_xenheap_pages), then there is no way for the PV domain to map it. > >> From my reading of xc_domain_decrease_reservation_exact(), I think it will >> also work for PV guests. Or am I missing something here? > >>> 2. Most of the mem_event / mem_access functions and variable name are >>> HVM specific. Given that I am enabling it for PV; I will change the >>> names to something more generic. This also holds for the mem_access >>> hypercalls, which fall under HVM ops and do_hvm_op(). My plan is to >>> make them a memory op or a domctl. >> >> Sure. >> >>> 3. A new shadow option will be added called PG_mem_access. This mode >>> is basic shadow mode with the addition of a table that will track the >>> access permissions of each page in the guest. >>> mem_access_tracker[gfmn] = access_type If there is a place where I can >>> stash this in an existing structure, please point me at it. >> >> My suggestion was that you should make another implementation of the >> p2m.h interface, which is already called in all the right places. You might >> want >> to borrow the tree-building code from the existing p2m-pt.c, though there's >> no reason why your table should be structured as a pagetable. The important >> detail is that you should be using memory from the shadow pool to hold this >> datastructure. > > OK, I will go down the path. I agree that my table needn't be structured as a > pagetable. The other thing I was thinking about is stashing the access > information in the per mfn page_info structures. Or is that memory overhead > too much of an overkill? Well, the page/MFN could conceivably be mapped by many domains. There are ample bits to play with in the type flag, for example. But as long as you don't care about mem_event on pages shared across two or more PV domains, then that should be fine. I wouldn't blame you if you didn't care :) OTOH, all you need is a byte per pfn, and the great thing is that in PV domains, the physmap is bounded and continuous. Unlike HVM and its PCI holes, etc, which demand the sparse tree structure. So you can allocate an easily indexable array, notwithstanding super page concerns (I think/hope). Andres > >>> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the start >>> pfn/gmfn is ~0ull, it takes it as a request to set default access. >>> Here we will call shadow_blow_tables() after recording the default >>> access type for the domain. In the mode where it is setting mem_access >>> type for individual gmfns, we will call a function that will drop the >>> shadow for that individual gmfn. I am not sure which function to call. >>> Will sh_remove_all_mappings(gmfn) do the trick? >> >> Yes, sh_remove_all_mappings() is the one you want. >> >>> The other issue here is that in the HVM case we could use >>> xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn >>> to gfn+nr would be set. This won't be possible in the PV case as we >>> are actually dealing with mfns and mfn to mfn+nr need not belong to >>> the same guest. But given that setting *all* page access permissions >>> are done implicitly when setting default access, I think we can live >>> with setting page permissions one at a time as they are faulted in. >> >> Seems OK to me. >> >>> 8. In sh_page_fault() perform access checks similar to >>> ept_handle_violation() / hvm_hap_nested_page_fault(). >> >> Yep. >> >>> 9. Hook in to _sh_propagate() and set up the L1 entries based on >>> access permissions. This will be similar to ept_p2m_type_to_flags(). I >>> think I might also have to hook in to the code that emulates page >>> table writes to ensure access permissions are honored there too. >> >> I guess you might; again, the p2m interface will help here, and probably the >> exisitng tidy-up code in emulate_gva_to_mfn will be the place to hook. > > Thanks so much for the feedback. > Aravindh _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.