[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers



On 19/05/14 10:37, Andrew Cooper wrote:
> On 17/05/14 17:01, Jason Andryuk wrote:
>> xc_domain_resume() expects the guest to be in state SHUTDOWN_suspend.
>> However, nothing verifies the state before modify_returncode() modifies
>> the domain's registers.  This will crash guest processes or the kernel
>> itself.
>>
>> This can be demonstrated with `LIBXL_SAVE_HELPER=/bin/false xl migrate`.
>>
>> Signed-off-by: Jason Andryuk <andryuk@xxxxxxxx>
> Hmm.
>
> There is no possible way whatsoever that migration can work if a PV
> guest is not in SHUTDOWN_suspend.  PV guests have to leave an MFN in edx
> which the toolstack rewrites with a new MFN on resume.
>
> By default, there is no need for knowledge from the HVM guest for
> migrate.  XenServer is perfectly capable of migrating HVM VMs without PV
> drivers.  I suspect therefore that we never use cooperative resume.
>
> This cooperative resume which modifies guest register state therefore
> imposes the same SHUTDOWN_suspend restriction on HVM guests as it does
> for PV guests.  As a result, your patch below is correct as a fallback
> safety measure, and should be taken.
>
> However the caller of modify_returncode is also at fault for attempting
> to resume an already-running domain.  I think there needs to be a bugfix
> there as well.  I presume that some piece of code is assuming that
> despite libxl-save-helper failing, xc_domain_safe() paused the guest,
> which is clearly not true in this case.
>
> ~Andrew

And here, I actually mean xc_domain_save()

~Andrew

>
>> ---
>>
>> This change stops xc_domain_resume from killing my domUs on a failed
>> migration.  I'm using a wrapper around libxl-save-helper which may fail
>> before libxl-save-helper is invoked, so xc_domain_save has not been
>> called.  The idle Linux domU kernels would BUG coming out of
>> SCHEDOP_block in xen_safe_halt() since modify_returncode set EAX to 1.
>> journald was also observed to segfault.
>>
>> As written, this code treats calling xc_domain_resume on a running
>> domain as an error.  Do we want it silently ignored?  Output with this
>> patch looks like:
>>
>> """
>> Migration failed, resuming at sender.
>> xc: error: Domain not in suspended state: Internal error
>> libxl: error: libxl.c:402:libxl__domain_resume: xc_domain_resume failed for 
>> domain 92: Interrupted system call
>> """
>>
>> libxl__domain_resume prints errno, but it is stale for this case.
>> xc_domain_resume_cooperative could swallow modify_returncode's error,
>> bypass issuing XEN_DOMCTL_resumedomain, and return success to avoid the
>> libxl error message.
>>
>> ---
>>  tools/libxc/xc_resume.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
>> index 18b4818..9ec6a59 100644
>> --- a/tools/libxc/xc_resume.c
>> +++ b/tools/libxc/xc_resume.c
>> @@ -39,6 +39,12 @@ static int modify_returncode(xc_interface *xch, uint32_t 
>> domid)
>>          return -1;
>>      }
>>  
>> +    if ( !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) )
>> +    {
>> +        ERROR("Domain not in suspended state");
>> +        return 1;
>> +    }
>> +
>>      if ( info.hvm )
>>      {
>>          /* HVM guests without PV drivers have no return code to modify. */
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.