WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: XCP: signal -7 (Re: [Xen-devel] XCP: sr driver question wrt vm-migra

To: YAMAMOTO Takashi <yamamoto@xxxxxxxxxxxxx>
Subject: Re: XCP: signal -7 (Re: [Xen-devel] XCP: sr driver question wrt vm-migrate)
From: Jonathan Ludlam <Jonathan.Ludlam@xxxxxxxxxxxxx>
Date: Fri, 18 Jun 2010 13:53:49 +0100
Accept-language: en-US
Acceptlanguage: en-US
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 18 Jun 2010 05:54:37 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100618024544.485DC71A53@xxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20100617095207.1107670E44@xxxxxxxxxxxxxxxx> <20100618024544.485DC71A53@xxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcsO5UyRUedjQfz0TDuk6LZ1HQSwKQ==
Thread-topic: XCP: signal -7 (Re: [Xen-devel] XCP: sr driver question wrt vm-migrate)
Very strange indeed. -7 is SIGKILL. 

Firstly, is this 0.1.1 or 0.5-RC? If it's 0.1.1 could you retry it on 0.5 just 
to check if it's already been fixed?

Secondly, can you check whether xapi has started properly on both machines (ie. 
the init script completed successfully)? I believe that if the init script 
doesn't detect that xapi has started correctly it might kill it. This is about 
the only thing we can think of that might cause the problem you described.

Cheers,

Jon


On 18 Jun 2010, at 03:45, YAMAMOTO Takashi wrote:

> hi,
> 
> i got another error on vm-migrate.
> "signal -7" in the log seems intersting.  does this ring your bell?
> 
> YAMAMOTO Takashi
> 
> + date
> Thu Jun 17 21:51:44 JST 2010
> + xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 
> host=67b8b07b-8c50-4677-a511-beb196ea766f
> Lost connection to the server.
> 
> /var/log/messages:
> 
> Jun 17 21:51:40 s1 ovs-cfg-mod: 
> 00007|cfg_mod|INFO|-port.vif4958.1.vm-uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592
> Jun 17 21:51:41 s1 xapi: [ warn|s1|2416799 unix-RPC|VM.pool_migrate 
> R:832813c0722b|hotplug] Warning, deleting 'vif' entry from 
> /xapi/4958/hotplug/vif/1
> Jun 17 21:51:41 s1 xapi: [error|s1|90 xal_listen|VM (domid: 4958) 
> device_event = ChangeUncooperative false D:0e953bd99071|event] device_event 
> could not be processed because VM record not in database
> Jun 17 21:51:47 s1 xapi: [ warn|s1|2417066 inet_rpc|VM.pool_migrate 
> R:fee54e870a4e|xapi] memory_required_bytes = 1080033280 > memory_static_max = 
> 1073741824; clipping
> Jun 17 21:51:57 s1 xenguest: Determined the following parameters from 
> xenstore:
> Jun 17 21:51:57 s1 xenguest: vcpu/number:1 vcpu/affinity:0 vcpu/weight:0 
> vcpu/cap:0 nx: 0 viridian: 1 apic: 1 acpi: 1 pae: 1 acpi_s4: 0 acpi_s3: 0
> Jun 17 21:52:43 s1 scripts-vif: Called as "add vif" domid:4959 devid:1 
> mode:vswitch
> Jun 17 21:52:44 s1 scripts-vif: Called as "online vif" domid:4959 devid:1 
> mode:vswitch
> Jun 17 21:52:46 s1 scripts-vif: Adding vif4959.1 to xenbr0 with address 
> fe:ff:ff:ff:ff:ff
> Jun 17 21:52:46 s1 ovs-vsctl: Called as br-to-vlan xenbr0
> Jun 17 21:52:49 s1 ovs-cfg-mod: 00001|cfg|INFO|using "/etc/ovs-vswitchd.conf" 
> as configuration file, "/etc/.ovs-vswitchd.conf.~lock~" as lock file
> Jun 17 21:52:49 s1 ovs-cfg-mod: 00002|cfg_mod|INFO|configuration changes:
> Jun 17 21:52:49 s1 ovs-cfg-mod: 
> 00003|cfg_mod|INFO|+bridge.xenbr0.port=vif4959.1
> Jun 17 21:52:49 s1 ovs-cfg-mod: 
> 00004|cfg_mod|INFO|+port.vif4959.1.net-uuid=9ca059b1-ac1e-8d3f-ff19-e5e74f7b7392
> Jun 17 21:52:49 s1 ovs-cfg-mod: 
> 00005|cfg_mod|INFO|+port.vif4959.1.vif-mac=2e:17:01:b0:05:fb
> Jun 17 21:52:49 s1 ovs-cfg-mod: 
> 00006|cfg_mod|INFO|+port.vif4959.1.vif-uuid=271f0001-06ca-c9ca-cabc-dc79f412d925
> Jun 17 21:52:49 s1 ovs-cfg-mod: 
> 00007|cfg_mod|INFO|+port.vif4959.1.vm-uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592
> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] received signal -7
> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] xapi watchdog 
> exiting.
> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] Fatal: xapi died 
> with signal -7: not restarting (watchdog never restarts on this signal)
> Jun 17 21:55:11 s1 python: PERFMON: caught IOError: (socket error (111, 
> 'Connection refused')) - restarting XAPI session
> Jun 17 22:00:02 s1 python: PERFMON: caught socket.error: (111 Connection 
> refused) - restarting XAPI session
> Jun 17 22:04:48 s1 python: PERFMON: caught socket.error: (111 Connection 
> refused) - restarting XAPI session
> Jun 17 22:09:58 s1 python: PERFMON: caught socket.error: (111 Connection 
> refused) - restarting XAPI session
> Jun 17 22:14:52 s1 python: PERFMON: caught socket.error: (111 Connection 
> refused) - restarting XAPI session
> Jun 17 22:19:38 s1 python: PERFMON: caught socket.error: (111 Connection 
> refused) - restarting XAPI session
> 
>> hi,
>> 
>> thanks.  i'll take a look at the log if it happens again.
>> 
>> YAMAMOTO Takashi
>> 
>>> This is usually the result of a failure earier on. Could you grep through 
>>> the logs to get the whole trace of what went on? Best thing to do is grep 
>>> for VM.pool_migrate, then find the task reference (the hex string beginning 
>>> with 'R:' immediately after the 'VM.pool_migrate') and grep for this string 
>>> in the logs on both the source and destination machines. 
>>> 
>>> Have a look  through these, and if it's still not obvious what went wrong, 
>>> post them to the list and we can have a look.
>>> 
>>> Cheers,
>>> 
>>> Jon
>>> 
>>> 
>>> On 16 Jun 2010, at 07:19, YAMAMOTO Takashi wrote:
>>> 
>>>> hi,
>>>> 
>>>> after making my sr driver defer the attach operation as you suggested,
>>>> i got migration work.  thanks!
>>>> 
>>>> however, when repeating live migration between two hosts for testing,
>>>> i got the following error.  it doesn't seem so reproducable.
>>>> do you have any idea?
>>>> 
>>>> YAMAMOTO Takashi
>>>> 
>>>> + xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 
>>>> host=67b8b07b-8c50-4677-a511-beb196ea766f
>>>> An error occurred during the migration process.
>>>> vm: 23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 (CentOS53x64-1)
>>>> source: eea41bdd-d2ce-4a9a-bc51-1ca286320296 (s6)
>>>> destination: 67b8b07b-8c50-4677-a511-beb196ea766f (s1)
>>>> msg: Caught exception INTERNAL_ERROR: [ 
>>>> Xapi_vm_migrate.Remote_failed("unmarshalling result code from remote") ] 
>>>> at last minute during migration
>>>> 
>>>>> hi,
>>>>> 
>>>>> i'll try deferring the attach operation to vdi_activate.
>>>>> thanks!
>>>>> 
>>>>> YAMAMOTO Takashi
>>>>> 
>>>>>> Yup, vdi activate is the way forward.
>>>>>> 
>>>>>> If you advertise VDI_ACTIVATE and VDI_DEACTIVATE in the 
>>>>>> 'get_driver_info' response, xapi will call the following during the 
>>>>>> start-migrate-shutdown lifecycle:
>>>>>> 
>>>>>> VM start:
>>>>>> 
>>>>>> host A: VDI.attach
>>>>>> host A: VDI.activate
>>>>>> 
>>>>>> VM migrate:
>>>>>> 
>>>>>> host B: VDI.attach
>>>>>> 
>>>>>> (VM pauses on host A)
>>>>>> 
>>>>>> host A: VDI.deactivate
>>>>>> host B: VDI.activate
>>>>>> 
>>>>>> (VM unpauses on host B)
>>>>>> 
>>>>>> host A: VDI.detach
>>>>>> 
>>>>>> VM shutdown:
>>>>>> 
>>>>>> host B: VDI.deactivate
>>>>>> host B: VDI.detach
>>>>>> 
>>>>>> so the disk is never activated on both hosts at once, but it does still 
>>>>>> go through a period when it is attached to both hosts at once. So you 
>>>>>> could, for example, check that the disk *could* be attached on the 
>>>>>> vdi_attach SMAPI call, and actually attach it properly on the 
>>>>>> vdi_activate call.
>>>>>> 
>>>>>> Hope this helps,
>>>>>> 
>>>>>> Jon
>>>>>> 
>>>>>> 
>>>>>> On 7 Jun 2010, at 09:26, YAMAMOTO Takashi wrote:
>>>>>> 
>>>>>>> hi,
>>>>>>> 
>>>>>>> on vm-migrate, xapi attaches a vdi on the migrate-to host
>>>>>>> before detaching it on the migrate-from host.
>>>>>>> unfortunately it doesn't work for our product, which doesn't
>>>>>>> provide a way to attach a volume to multiple hosts at the same time.
>>>>>>> is VDI_ACTIVATE something what i can use as a workaround?
>>>>>>> or any other suggestions?
>>>>>>> 
>>>>>>> YAMAMOTO Takashi
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>>>> http://lists.xensource.com/xen-devel
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>>> http://lists.xensource.com/xen-devel
>>>>> 
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>> http://lists.xensource.com/xen-devel
>>> 
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel