On 26.08.13 at 19:00, Tomasz Wroblewski<tomasz.wroblewski@xxxxxxxxxx> wrote:
I've verified that replacing security_load_policy call
completely with the following allocation/deallocaiton is enough to cause
this crash:
//ret = security_load_policy(policy_buffer, policy_size);
{
void ** p = xmalloc_array(void*, 768);
xfree(p);
}
Note that this allocation succeeds, and also if you would not call xfree
(which is not called if say a policy was succesfully loaded), there is
no crash. So yeah my original patch accidentaly fixes it by just
avoiding the alloc/free completely.
But I then understand, together with the below, that the crash isn't
down the xfree() path, ...
The shaky manually constructed call graph for the assertion failure:
setup.c: init_idle_domain
schedule.c: scheduler_init
domain.c: domain_create
domain.c: alloc_domain_struct
domain.c: alloc_xenheap_pages
..
page_alloc.c: alloc_heap_pages
flushtlb.h: flush_tlb_mask
flushtlb.h: flush_mask
smp.c: flush_area_mask - hits ASSERT because interrupts are disabled here
... but instead is on a _subsequent_ allocation. Hence the prior
free gets a heap page into a state that makes in non-suitable for
re-use. But you certainly noticed that free_heap_pages() sets a
page's u.free.need_tlbflush only if the page had an owner,
which shouldn't be the case for Xen-internal allocations.
With that, I think I can see where the bug really is: The owner
field (v.inuse._domain) is in a union with the order field re-used
by the xmalloc() implementation for whole page allocations. The
fix therefore ought to be as simple as the patch below.
Jan
--- a/xen/common/xmalloc_tlsf.c
+++ a/xen/common/xmalloc_tlsf.c
@@ -629,6 +629,7 @@ void xfree(void *p)
unsigned int i, order = get_order_from_pages(size);
BUG_ON((unsigned long)p& ((PAGE_SIZE<< order) - 1));
+ PFN_ORDER(virt_to_page(p)) = 0;
for ( i = 0; ; ++i )
{
if ( !(size& (1<< i)) )