[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork



On Wed, Dec 18, 2019 at 4:02 PM Julien Grall <julien@xxxxxxx> wrote:
>
> Hi,
>
> On 18/12/2019 22:33, Tamas K Lengyel wrote:
> > On Wed, Dec 18, 2019 at 3:00 PM Julien Grall <julien@xxxxxxx> wrote:
> >>
> >> Hi Tamas,
> >>
> >> On 18/12/2019 19:40, Tamas K Lengyel wrote:
> >>> Implement hypercall that allows a fork to shed all memory that got 
> >>> allocated
> >>> for it during its execution and re-load its vCPU context from the parent 
> >>> VM.
> >>> This allows the forked VM to reset into the same state the parent VM is 
> >>> in a
> >>> faster way then creating a new fork would be. Measurements show about a 2x
> >>> speedup during normal fuzzing operations. Performance may vary depending 
> >>> how
> >>> much memory got allocated for the forked VM. If it has been completely
> >>> deduplicated from the parent VM then creating a new fork would likely be 
> >>> more
> >>> performant.
> >>>
> >>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@xxxxxxxxx>
> >>> ---
> >>>    xen/arch/x86/mm/mem_sharing.c | 105 ++++++++++++++++++++++++++++++++++
> >>>    xen/include/public/memory.h   |   1 +
> >>>    2 files changed, 106 insertions(+)
> >>>
> >>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> >>> index e93ad2ec5a..4735a334b9 100644
> >>> --- a/xen/arch/x86/mm/mem_sharing.c
> >>> +++ b/xen/arch/x86/mm/mem_sharing.c
> >>> @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, 
> >>> struct domain *cd)
> >>>        return 0;
> >>>    }
> >>>
> >>> +struct gfn_free;
> >>> +struct gfn_free {
> >>> +    struct gfn_free *next;
> >>> +    struct page_info *page;
> >>> +    gfn_t gfn;
> >>> +};
> >>> +
> >>> +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
> >>> +{
> >>> +    int rc;
> >>> +
> >>> +    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
> >>> +    struct gfn_free *list = NULL;
> >>> +    struct page_info *page;
> >>> +
> >>> +    page_list_for_each(page, &cd->page_list)
> >>
> >> AFAICT, your domain is not paused, so it would be possible to have page
> >> added/remove in that list behind your back.
> >
> > Well, it's not that it's not paused, it's just that I haven't added a
> > sanity check to make sure it is. The toolstack can (and should) pause
> > it, so that sanity check would be warranted.
> I have only read the hypervisor part, so I didn't know what the
> toolstack has done.

I've added the same enforced VM paused operation that is present for
the fork hypercall handler.

>
> >
> >>
> >> You also have multiple loop on the page_list in this function. Given the
> >> number of page_list can be quite big, this is a call for hogging the
> >> pCPU and an RCU lock on the domain vCPU running this call.
> >
> > There is just one loop over page_list itself, the second loop is on
> > the internal list that is being built here which will be a subset. The
> > list itself in fact should be small (in our tests usually <100).
>
> For a first, nothing in this function tells me that there will be only
> 100 pages. But then, I don't think this is right to implement your
> hypercall based only the  "normal" scenario. You should also think about
> the "worst" case scenario.
>
> In this case the worst case scenario is have hundreds of page in page_list.

Well, this is only an experimental system that's completely disabled
by default. Making the assumption that people who make use of it will
know what they are doing I think is fair.

>
> > Granted the list can grow larger, but in those cases its likely better
> > to just discard the fork and create a new one. So in my opinion adding
> > a hypercall continuation to this not needed
>
> How would the caller know it? What would happen if the caller ends up to
> call this with a growing list.

The caller knows by virtue of knowing how long the VM was executed
for. In the usecase this is targeted at the VM was executing only for
a couple seconds at most. Usually much less then that (we get about
~80 resets/s with AFL). During that time its extremely unlikely you
get more then a ~100 pages deduplicated (that is, written to). But
even if there are more pages, it just means the hypercall might take a
bit longer to run for that iteration. I don't see any issue with not
breaking up this hypercall with continuation even under the worst case
situation though. But if others feel that strongly as well about
having to have continuation for this I don't really mind adding it.

>
> >
> >>
> >>> +    {
> >>> +        mfn_t mfn = page_to_mfn(page);
> >>> +        if ( mfn_valid(mfn) )
> >>> +        {
> >>> +            p2m_type_t p2mt;
> >>> +            p2m_access_t p2ma;
> >>> +            gfn_t gfn = mfn_to_gfn(cd, mfn);
> >>> +            mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
> >>> +                                        0, NULL, false);
> >>> +            if ( p2m_is_ram(p2mt) )
> >>> +            {
> >>> +                struct gfn_free *gfn_free;
> >>> +                if ( !get_page(page, cd) )
> >>> +                    goto err_reset;
> >>> +
> >>> +                /*
> >>> +                 * We can't free the page while iterating over the 
> >>> page_list
> >>> +                 * so we build a separate list to loop over.
> >>> +                 *
> >>> +                 * We want to iterate over the page_list instead of 
> >>> checking
> >>> +                 * gfn from 0 to max_gfn because this is ~10x faster.
> >>> +                 */
> >>> +                gfn_free = xmalloc(struct gfn_free);
> >>
> >> If I did the math right, for a 4G guest this will require at ~24MB of
> >> memory. Actually, is it really necessary to do the allocation for a
> >> short period of time?
> >
> > If you have a fully deduplicated fork then you should not be using
> > this function to begin with. You get better performance my throwing
> > that one away and creating a new one.
>
> How a user knows when/how this can be called? But then, as said above,
> this may be called by mistake... So I still think you need to be prepare
> for the worst case.

See my answer above.

>
> > As for using xmalloc here, I'm
> > not sure what other way I have to build a list of pages that need to
> > be freed. I can't free the page itself while I'm iterating on
> > page_list (that I'm aware of). The only other option available is
> > calling __get_gfn_type_access with gfn=0..max_gfn which will be
> > extremely slow because you have to loop over a lot of holes.
> You can use page_list_for_each_safe(). This is already used by function
> such as relinquish_memory().

Neat, will check it out! Would certainly simplify things not having to
build a private list.

>
> >
> >>
> >> What are you trying to achieve by iterating twice on the GFN? Wouldn't
> >> it be easier to pause the domain?
> >
> > I'm not sure what you mean, where do you see me iterating twice on the
> > gfn? And what does pausing have to do with it?
>
> It was unclear why you decided to use a double loop here. You explained
> it above, so this can be discarded.

OK, cool.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.