>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 30.04.08 16:26 >>>
>On 30/4/08 15:00, "Jan Beulich" <jbeulich@xxxxxxxxxx> wrote:
>>
>> According to two forced backtraces with about a second delta, the
>> hypervisor is in the process of releasing the 1:1 mapping of the
>> guest kernel and managed, during that one second, to increment
>> i in free_l3_table() by just 1. This would make up for unbelievable
>> 13,600 clocks per l1 entry being freed.
>
>That's not great. :-) At such a high cost, perhaps some tracing might
>indicate if we are taking some stupid slow path in free_domheap_page() or
>cleanup_page_cacheattr()? I very much hope that 13600 cycles cannot be
>legitimately accounted for!
I'm afraid it's really that bad. I used another (local to my office) machine,
and the numbers aren't exactly as bad as on the box they were originally
measured on, but after getting the cumulative clock cycles spent in
free_l1_table() and free_domheap_pages() (and their descendants,
so the former obviously includes a large part of the latter) during the
largest single run of relinquish_memory() I'm getting an average of
3,400 clocks spent in free_domheap_pages() (with all but very few
pages going onto the scrub list) and 8,500 clocks spent per page
table entry (assuming all entries are populated, so the number really
is higher) in free_l1_table().
It's the relationship between the two numbers that makes me believe
that there's really this much time spent on it.
For the specific case of cleaning up after a domain, there seems to
be a pretty simple workaround, though: free_l{3,4}_table() can
simply avoid recursing into put_page_from_l{3,4}e() by checking
d->arch.relmem being RELMEM_dom_l{3,4}. This, as expected,
reduces the latency of preempting relinquish_memory() (for a 5G
domU) on the box I tested from about 3s to less than half a second -
if that's considered still too much, the same kind of check could
of course be added to free_l2_table().
But as there's no similarly simple mechanism to deal with the DoS
potential in pinning/unpinning or installing L4 (and maybe L3) table
entries, there'll need to be a way to preempt these call trees
anyway. Since hypercalls cannot nest, storing respective state
in the vcpu structure shouldn't be a problem, but what I'm unsure
about is what side effects a partially validated page table might
introduce.
While looking at this I wondered whether there really is a way for
Xen heap pages to end up being guest page tables (or similarly
descriptor table ones)? I would think if that happened this would be
a bug (and perhaps a security issue). If it cannot happen, then the
RELMEM_* states could be simplified and
domain_relinquish_resources() shortened.
(I was traveling, so it took a while to get to do the measurements.)
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|