[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Strange kernel BUG() on PV DomU boot



>>> On 22.06.12 at 14:26, Joanna Rutkowska <joanna@xxxxxxxxxxxxxxxxxxxxxx> 
>>> wrote:
> On 06/22/12 14:21, Joanna Rutkowska wrote:
>> Hello,
>> 
>> From time to time (every several weeks or even less) I run into a
>> strange Dom0 kernel BUG() that manifests itself with the following
>> message (see the end of the message). The Dom0 and VM kernels are 3.2.7
>> pvops, and the Xen hypervisor is 4.1.2 both with only some minor,
>> irrelevant (I think) modifications for Qubes.
>> 
>> The bug is very hard to reproduce, but once this BUG() starts being
>> signaled, it consistently prevents me from starting any new VMs in the
>> system (e.g. tried over a dozen of times now, and every time the VM boot
>> fails).
>> 
>> The following lines in the VM kernel are responsible for signaling the
>> BUG():
>> 
>>   if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt))
>>         BUG();
>> 
>> ...yet, there is nothing in the xl dmesg that would provide more info
>> why this hypercall fails. Ah, that's because there are not printk's in
>> the hypercall code:
>> 
>>    case VCPUOP_initialise:
>>         if ( v->vcpu_info == &dummy_vcpu_info )
>>             return -EINVAL;
>> 
>>         if ( (ctxt = xmalloc(struct vcpu_guest_context)) == NULL )
>>             return -ENOMEM;
>> 
>>         if ( copy_from_guest(ctxt, arg, 1) )
>>         {
>>             xfree(ctxt);
>>             return -EFAULT;
>>         }
>> 
>>         domain_lock(d);
>>         rc = -EEXIST;
>>         if ( !v->is_initialised )
>>             rc = boot_vcpu(d, vcpuid, ctxt);
>>         domain_unlock(d);
>> 
>>         xfree(ctxt);
>>         break;
>> 
>> So, looking at the above it seems like it might be failing because of
>> xmalloc() fails, however Xen seems to have enough memory as reported by
>> xl info:
>> 
>> total_memory           : 8074
>> free_memory            : 66
>> free_cpus              : 0
>> 
>> Any ideas what might be the cause?
>> 
>> FWIW, below the actual oops message.
>> 
> 
> Ok, it seems like this was an out-of-memeory condition indeed, because
> once I did:
> 
> xl mem-set 0 1800m
> 
> and then quickly started a VM, it booted fine...

Had you looked at the error value in %rax, you would also
have seen that it's -ENOMEM. I suppose the problem here is
that a multi-page allocation was needed, yet only single
pages were available.

> Is there any proposal of how to handle out of memory conditions in Xen
> (like this one, as well as e.g. SWIOTLB problem) in a more user friendly
> way?

In 4.2, I hope we managed to remove all runtime allocations
larger than a page, so the particular situation here should arise
anymore.

As to more user-friendly - what do you think of? An error is an
error (and converting this to a meaningful, user visible message
is the responsibility of the entity receiving the error). In the
case at hand, printing an error message wouldn't meaningfully
increase user-friendliness imo.

> Any recommendations regarding the preferred minimum Xen free memory, as
> reported by xl info, that should be preserved in order to assure Xen
> runs smoothly?

In pre-4.2 Xen, there's not much you can do when memory gets
fragmented (otherwise you'd have to keep more than half the
memory in the box in the hypervisor). With multi-page runtime
allocations gone, you should be fine leaving just a minimal amount
to the hypervisor.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.