[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing



> -----Original Message-----
> From: Ian Campbell [mailto:Ian.Campbell@xxxxxxxxxx]
> Sent: Monday, August 05, 2013 10:53 PM
> To: Stefano Stabellini
> Cc: Jaeyong Yoo; xen-devel@xxxxxxxxxxxxx
> Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write
> fault for dirty-page tracing
> 
> On Mon, 2013-08-05 at 12:11 +0100, Stefano Stabellini wrote:
> > On Mon, 5 Aug 2013, Jaeyong Yoo wrote:
> > > > -----Original Message-----
> > > > From: Stefano Stabellini [mailto:stefano.stabellini@xxxxxxxxxxxxx]
> > > > Sent: Monday, August 05, 2013 1:28 AM
> > > > To: Jaeyong Yoo
> > > > Cc: xen-devel@xxxxxxxxxxxxx
> > > > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling
> > > > write fault for dirty-page tracing
> > > >
> > > > On Thu, 1 Aug 2013, Jaeyong Yoo wrote:
> > > > > Add handling write fault in do_trap_data_abort_guest for
> > > > > dirty-page
> > > > tracing.
> > > > > Rather than maintaining a bitmap for dirty pages, we use the
> > > > > avail bit
> > > > in p2m entry.
> > > > > For locating the write fault pte in guest p2m, we use
> > > > > virtual-linear page table that slots guest p2m into xen's virtual
> memory.
> > > > >
> > > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@xxxxxxxxxxx>
> > > >
> > > > Looks good to me.
> > > > I would appreciated some more comments in the code to explain the
> > > > inner working of the vlp2m.
> > > I got it.
> > >
> > > One question: If you see patch #6, it implements the allocation and
> > > free of vlp2m memory (xen/arch/arm/vlpt.c) which is almost the same
> > > to vmap allocation (xen/arch/arm/vmap.c). To be honest, I copied
> > > vmap.c and change the virtual address start/end points and the name.
> > > While I was doing that, I think it would be better if we naje a
> > > common interface, something like Virtual address allocator. That is,
> > > if we create a virtual address allocator
> > >
> > > giving the VA range from A to B, the allocator allocates the VA in
> > > between A and B. And, we initialize the virtual allocator instance at
> boot stage.
> >
> > Good question. I think it might be best to improve the current vmap
> > (it's actually xen/common/vmap.c) so that we can have multiple vmap
> > instances for different virtual address ranges at the same time.
> 
> Before we go off and do that:
> 
> I don't think this patch implements a linear p2m mapping in the sense in
> which I intended it when I suggested it. The patch implements a manual
> lookup with a kind of cache of the resulting mapping, I think.
> 
> A linear mapping means inserting the current p2m base pointer into Xen's
> own pagetables in such a way that you can access a leaf node of the p2m by
> dereferencing a virtual address. Given this setup there should be no need
> for on-demand mapping as part of the log-dirty stuff, all the smarts
> happen at context switch time.
> 
> Normally a linear memory map is done by creating a loop in the page tables,
> i.e. HTTBR[N] would contain an entry which referenced HTTBR again. In this
> case we actually have a separate p2m table which we want to stitch into
> the normal tables, which makes it a bit different to the classical case.
> 
> Lets assume both Xen's page tables and the 2pm are two level, to simplify
> the ascii art.
> 
> So for the P2M you have:
> VTTBR
> `-------> P2M FIRST
>           `----------> P2M SECOND
>                        `-------------GUEST RAM
> 
> Now if we arrange that Xen's page tables contains the VTTBR in a top level
> page table slot:
> HTTBR
> `-------> VTTBR
>           `----------> P2M FIRST
>                        `-------------P2M SECOND, ACCESSED AS XEN RAM
> 
> So now Xen can access the leaf PTE's of the P2M directly just by using the
> correct virtual address.
> 
> This can be slightly tricky if P2M FIRST can contain super page mappings,
> since you need to arrange to stop a level sooner to get the correct PT
> entry. This means we need to arrange for a second virtual address region
> which maps to that, by arranging for a loop in the page table, e.g.
> 
> HTTBR
> `-------> HTTBR
>           `----------> VTTBR
>                        `-------------P2M FIRST, ACCESSED AS XEN RAM
> 
> Under Xen, which uses LPAE and 3-level tables, I think the P2M SECOND
> would require 16 first level slots in the xen_second tables, which need to
> be context switched, the regions needed to hit the super page mappings
> would need slots too. If we use the gap between 128M and 256M in the Xen
> memory map then that means we are using xen_second[64..80]=p2m[0..16] for
> the linear map of the p2m leaf nodes.
> We can then use xen_second[80..144] to point back to xen_second allowing
> xen_second[64..80] to be dereferenced and create the loop needed for
> mapping for the superpage ptes in the P2M.
> 
> So given
> VTTBR->P2M FIRST->P2M SECOND->P2M THIRD->GUEST RAM
> 
> We have in the Xen mappings:
> HTTBR->XEN_SECOND[64..80]->P2M FIRST[0..16]->P2M SECOND->P2M THIRD AS
> HTTBR->XEN RAM XEN_SECOND[80..144]->XEN_SECOND(*)->P2M FIRST[0..16]->P2M
> HTTBR->SECOND AS XEN RAM
> 
> (*) here we only care about XEN_SECOND[64..80] but the loop maps
> XEN_SECOND[0..512], a larger region which we can safely ignore.
> 
> So if my maths is correct this means Xen can access P2M THIRD entries at
> virtual addresses 0x8000000..0xa000000 and P2M SECOND entries at
> 0x12000000..0x14000000, which means that the fault handler just needs to
> lookup the P2M SECOND to check it isn't super page mapping and then lookup
> P2M FIRST to mark it dirty etc.
> 
> If for some reason we also need to access P2M FIRST efficiently we could
> add a third region, but I don't think we will be doing 1GB P2M mappings
> for the time being.
> 
> It occurs to me now that with 16 slots changing on context switch and a
> further 16 aliasing them (and hence requiring maintenance too) for the
> super pages it is possible that the TLB maintenance at context switch
> might get prohibitively expensive. We could address this by firstly only
> doing it when switching to/from domains which have log dirty mode enabled
> and then secondly by seeing if we can make use of global or locked down
> mappings for the static Xen .text/.data/.xenheap mappings and therefore
> allow us to use a bigger global flush.
> 
> In hindsight it might be the case that doing the domain_map_page walk on
> each lookup might be offset by the need to do all that TLB maintenance on
> context switch. It may be that this is something which we can only resolve
> by measuring?
> 
> BTW, eventually we will have a direct map of all RAM for 64-bit only, so
> we would likely end up with different schemes for p2m lookups for the two
> sub arches, since the 64-bit direct map case the domain_map_page is very
> cheap.
> 
> I hope my description of a linear map makes sense, hard to do without a
> whiteboard ;-)

Thanks a lot for the ascii art! Even without whiteboard, it works very
nicely for me :)

I think I understand your points. Previously, in my implementation, I 
created the Xen mapping to leaf PTE of the P2M by looking up the guest 
leaf p2m and create_xen_table, but everything could be better if I just 
map the xen_second to the guest's P2M first. Then, by just reading the 
correct VA, I can immediately access leaf PTE of guest p2m.

As a minor issue, I don't correctly understand your numbers within 
XEN_SECOND[64..80] for p2m third and XEN_SECOND[80..144] for p2m second. 
I think p2m third should have larger VA ranges than the one in p2m second.

If I'm not mistaken, if we try to migrate domU with memory size 4GB, it
requires VA sizes of 8MB for p2m third and 16KB for p2m second. Since we 
have 128MB size for vlpt, how about allocating vlpt for different ranges 
within 128MB to each migrating domU? This way, we don't need to context 
switch the xen second page tables. Although it limits the simultaneous 
live migration with large memory size DomU's, but for ARM, I think it is 
reasonable.


Best,
Jaeyong

> 
> Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.