Xen project Mailing List

[Xen-devel] Re: [PATCH 00/17] Netchannel2 for a modern git kernel

> >> BTW, do you see this is something as a candidate for merging upstream? > >> > > I've mostly been defining ``upstream'' as you, but, yes, sending it > > further would be good. > OK, but that's a fair bit more work. Yes, indeed. This is very much a long-term goal. It might make sense to send an initial version which doesn't support receiver-map mode first, because that avoids the whole PG_foreign issue. It'd be a bit slow, but it would work, and it'd be properly cross-compatible with a receiver-map capable version. > > The NC2 approach is basically similar to the NC1 approach, but > > generalised so that NC1 and NC2 can cooperate in a reasonably sane > > way. It still uses the PG_foreign bit to identify foreign pages, and > > the page->private and page->mapping fields for various bits of > > information. > Unfortunately the PG_foreign approach is a non-starter for upstream, > mainly because adding new page flags is strongly frowned upon unless > there's a very compelling reason. Unless we can find some other kernel > subsystems which can make use of a page destructor, we probably won't > make the cut. (It doesn't help that there are no more page flags left > on 32-bit.) Yeah, I didn't think that was going to go very far. It might be possible to do something like: 1) Create a special struct address_space somewhere. This wouldn't really do anything, but would just act as a placeholder. 2) Whenever we would normally set PG_foreign, set page->mapping to point at the placeholder address_space. 3) Rather than testing PG_foreign, test page->mapping == &placeholder. 4) Somehow move all of the Xen-specific bits which currently use ->mapping to use ->private instead. Then we wouldn't need the page bit. It's not even that much of an abuse; foreign memory is arguably a special kind of address space, so creating a struct address_space for it isn't insane. > The approach I'm trying at the moment is to use the skb destructor > mechanism to grab the pages out of the skb as its freed. To deal with > skb_clone, I'm adding a flag to the skb to force a clone to do a > complete copy so there are no more aliases to the pages (skb_clone > should be rare in the common case). Yeah, that would work. There needs to be some way for netback to get grant references and so forth related to netchannel2-mapped pages, and vice versa, but that shouldn't be too hard. > > The basic idea is that everything which can map foreign pages and > > expose them to the rest of Linux needs to allocate a foreign page > > tracker (effectively an array of (domid, grant_ref, void *ctxt) > > tuples), and to register mapped pages with that tracker. It then uses > > the top few bits of page->private to identify the tracker, and the > > rest to index into the array. This allows you to forward packets from > > a foreign domain without knowing whether it was received by NC1 or > > NC2. > Well, if its wrapped by a skb, we can get the skb destructor to handle > the cleanup phase. So long as we get the callback, I don't think it > should affect the rest of the mechanism. Cleanup isn't the tricky part. The problem is that you can't forward a packet unless you know which domain it came from and the relevant grant references, because Xen won't let you create grant references on a mapping of another domain's memory. You therefore need some way of translating a struct page in an skb into a (domid_t, grant_ref_t) pair. netback currently handles this with some private lookup tables, but that only works if it's the only thing which can inject foreign mappings into the stack. The foreign map tracker stuff was an attempt to generalise this to work with multiple netback-like drivers. > > Arguably, blkback should be using this mechanism as well, but since > > we've gotten away with it so far I thought it'd be best to let > > sleeping dogs lie. The only time it'd make any difference would be > > when pages out of a block request somehow get attached to network > > packets, which seems unlikely. > Block lifetimes are simpler because there's no cloning and bios have a > end_io callback which is more or less equivalent to the skb destructor. Yes, that's true, the cleanup bit is much easier for block requests, but you still potentially have a forwarding issue. There are a couple of potentially problematic scenarios: 1) You might have nested block devices. Suppose you have three domains (domA, domB, and domC), and a physical block device sdX in domA. DomA could then be configured to run a blkback exposing sdX to domB as xvdY. DomB might then itself run a blkback exposing xvdY to domC as xvdZ. This won't work. Requests issued by domC will be mapped by domB's blkback and injected into its local storage stack, and will eventually reach domB's xvdY blkfront. This will try to grant domA access to the relevant memory, but, because it doesn't know about foreign mappings, it'll grant as if the memory was owned by domB. Xen will then reject domA's attempts to map these domB grants, and every request on xvdZ will fail. Admittedly, that'd be a rather stupid configuration, but it's not currently blocked by the tools (and it'd be rather difficult to block, even if we wanted to). 2) I've not actually checked this, but I suspect we have problem if you're running an iSCSI initiator in dom0 against a target running in a domU, and then try to expose the SCSI device in dom0 as a block device in some other domU. When requests come in from the blkfront, the dom0 blkback will map them as foreign pages, and then pass them off to the iSCSI initiator. It would make sense for the pages in the block request to get attached to the skb as fragment pages, rather than copied. When the skb eventually reaches netback, netback will try to do a grant copy into the receiving netfront's buffers (because PG_foreign isn't set), which will fail, because dom0 doesn't actually own the pages. As I say, I've not actually checked whether that's how the initiators work, but it would be a sane implementation if you're talking to a NIC with jumbogram support. Thinking some more, there's another variant of this bug which doesn't involve block devices at all: bridging between a netfront and a netback. If you have a single bridge with both netfront and netback devices attached to it, and you're not in ALWAYS_COPY_SKB mode, forwarding packets from the netback interface to the netfront one won't work. Packets received by netback will be foreign mappings, but netfront doesn't know that, so when it sends packets to the backend it'll set up grants as if they were in local memory, which won't work. I'm not sure what the right fix for that is; probably just copying the packet in netfront. Steven.

Attachment: signature.asc
Description: Digital signature

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.