[Xen-devel] save / restore

Hello all,
i've upgraded a testing version done after  2.0.4  (around the 14 february).
since then i'm able to save and restore domain with more than one vbd exported
to (prior test failed when a domain had more than one vbd)
But it seems to have still some limitations.
1) Domains with more than a few vbd failed to save:
one of my domain have 7vbd:
    (device (vbd (uname phy:zeus_lvs/athena_root) (dev hdb1) (mode w)))
    (device (vbd (uname phy:zeus_lvs/athena_var) (dev hdb2) (mode w)))
    (device (vbd (uname phy:zeus_lvs/athena_usr) (dev hdb3) (mode w)))
    (device (vbd (uname phy:zeus_lvs/athena_home) (dev hdb4) (mode w)))
    (device (vbd (uname phy:zeus_lvs/athena_spool_mail) (dev hdb5) (mode w)))
    (device (vbd (uname phy:zeus_lvs/athena_save) (dev hdb6) (mode w)))
    (device (vbd (uname phy:zeus_lvs/athena_swap) (dev hdc1) (mode w)))
and saving failed :
zeus:/etc/xen# xm save athena /usr/save
Error: Error: [Failure instance: Traceback: xen.xend.XendError.XendError, save
here is the extract of the xfrd log:
[DEBUG] Conn_sxpr>
(xfr.save 1 "(domain (id 1) (name athena) (memory 127) (maxmem 131072) (state
-b---) (cpu 0) (cpu_time 3843.15514214) (up_time 89086.369431) (start_time
1109629764.54) (console (status listening) (id 13) (domain 1) (local_port 13)
(remote_port 1) (console_port 9601)) (devices (vif (idx 0) (vif 0) (mac
aa:00:00:00:00:20) (bridge xen-br0) (evtchn 15 4) (index 0)) (vbd (idx 0) (vdev
833) (device 65028) (mode w) (dev hdb1) (uname phy:zeus_lvs/athena_root) (node
zeus_lvs/athena_root) (index 0)) (vbd (idx 1) (vdev 834) (device 65029) (mode w)
(dev hdb2) (uname phy:zeus_lvs/athena_var) (node zeus_lvs/athena_var) (index 1))
(vbd (idx 2) (vdev 835) (device 65030) (mode w) (dev hdb3) (uname
phy:zeus_lvs/athena_usr) (node zeus_lvs/athena_usr) (index 2)) (vbd (idx 3)
(vdev 836) (device 65031) (mode w) (dev hdb4) (uname phy:zeus_lvs/athena_home)
(node zeus_lvs/athena_home) (index 3)) (vbd (idx 4) (vdev 837) (device 65032)
(mode w) (dev hdb5) (uname phy:zeus_lvs/athena_spool_mail) (node
zeus_lvs/athena_spool_mail) (index 4)) (vbd (idx [DEBUG] Conn_sxpr< err=-12 2654
[INF] XFRD> Xfr service err=-12 

xend.log has thoses informations:

[2005-03-02 00:14:10 xend] INFO (XendMigrate:370) Save BEGIN: ['save', ['id', '1
'], ['state', 'begin'], ['domain', '1'], ['file', '/usr/save']]
[2005-03-02 00:14:11 xend] INFO (XendRoot:91) EVENT> xend.domain.save ['athena',
 '1', 'begin', ['save', ['id', '1'], ['state', 'begin'], ['domain', '1'], ['file
', '/usr/save']]]
[2005-03-02 00:14:11 xend] INFO (XendMigrate:390) Save ERROR: ['save', ['id', '1
'], ['state', 'error'], ['domain', '1'], ['file', '/usr/save']]
[2005-03-02 00:14:11 xend] INFO (XendRoot:91) EVENT> xend.domain.save ['athena',
 '1', 'error', ['save', ['id', '1'], ['state', 'error'], ['domain', '1'], ['file
', '/usr/save']]]

xend-debug.log this one:

sync_session> <type 'str'> 1 ['save', ['id', '1'], ['state', 'begin'], ['domain'
, '1'], ['file', '/usr/save']]
Started to connect self= <xen.xend.XendMigrate.XfrdClientFactory instance at 0xb
78e1f4c> connector= <twisted.internet.tcp.Connector instance at 0xb78e1d8c>
buildProtocol> IPv4Address(TCP, 'localhost', 8002)

2) A simple domain with 2 vif will failed to save (even if there is only one

3)A simple domain (1vif+2vbd) failed to save when you export a device to it 
This is the case of my firewall which has the extrernal network card hided from
Dom0 and showed in a DomU.
When i issue the xm save command the action is pretty long ~ 1 minute and after
i get this message:
zeus:/var/log# xm save cerbere /usr/cerbere
Error: Error: [Failure instance: Traceback: twisted.internet.defer.TimeoutError,
Callback timed out

xfrd contains this message:
[DEBUG] Conn_sxpr>
(xfr.save 2 "(domain (id 2) (name cerbere) (memory 31) (maxmem 32768) (state -b-
--) (cpu 0) (cpu_time 108.68242009) (up_time 89498.2455249) (start_time 11096297
66.13) (console (status listening) (id 16) (domain 2) (local_port 16) (remote_po
rt 1) (console_port 9602)) (devices (vif (idx 0) (vif 0) (mac aa:00:00:00:00:10)
 (bridge xen-br0) (evtchn 18 5) (index 0)) (vbd (idx 0) (vdev 833) (device 65026
) (mode w) (dev hdb1) (uname phy:zeus_lvs/cerbere_root) (node zeus_lvs/cerbere_r
oot) (index 0)) (vbd (idx 1) (vdev 834) (device 65027) (mode w) (dev hdb2) (unam
e phy:zeus_lvs/cerbere_var) (node zeus_lvs/cerbere_var) (index 1))) (config (vm
(name cerbere) (id 3) (memory 32) (image (linux (kernel /boot/vmlinuz-2.6.10-xen
0) (root '/dev/hdb1 ro') (ip off))) (device (pci (bus 0x00) (dev 0x0B) (func 0x0
))) (device (vbd (uname phy:zeus_lvs/cerbere_root) (dev hdb1) (mode w))) (device
 (vbd (uname phy:zeus_lvs/cerbere_var) (dev hdb2) (mode w))) (device (vif (mac A
A:00:00:00:00:10) (bridge xen-br0))) (restart onreboot))))" /usr/cerbere)[DEBUG]
 Conn_sxpr< err=0
[1109719264.449167] xc_linux_save start 2

xc_linux_save start 2
[DEBUG] Conn_sxpr>
(xfr.err 22)[DEBUG] Conn_sxpr< err=0
Retry suspend domain (10005) <-- this line is repeated  a large number of times
Retry suspend domain (10005)
Retry suspend domain (10005)
Unable to suspend domain. (10005)
Unable to suspend domain. (10005)
Domain appears not to have suspended: 10005
Domain appears not to have suspended: 10005
2662 [INF] XFRD> Xfr service err=0

the xend-debug contains thoses lines:
sync_session> <type 'str'> 2 ['save', ['id', '2'], ['state', 'begin'], ['domain'
, '2'], ['file', '/usr/cerbere']]
Started to connect self= <xen.xend.XendMigrate.XfrdClientFactory instance at 0xb
78e126c> connector= <twisted.internet.tcp.Connector instance at 0xb78e1fac>
buildProtocol> IPv4Address(TCP, 'localhost', 8002)
***request> (domain (id 2) (name cerbere) (memory 31) (maxmem 32768) (state -b--
-) (cpu 0) (cpu_time 108.68242009) (up_time 89498.233706) (start_time 1109629766
.13) (console (status listening) (id 16) (domain 2) (local_port 16) (remote_port
 1) (console_port 9602)) (devices (vif (idx 0) (vif 0) (mac aa:00:00:00:00:10) (
bridge xen-br0) (evtchn 18 5) (index 0)) (vbd (idx 0) (vdev 833) (device 65026)
(mode w) (dev hdb1) (uname phy:zeus_lvs/cerbere_root) (node zeus_lvs/cerbere_roo
t) (index 0)) (vbd (idx 1) (vdev 834) (device 65027) (mode w) (dev hdb2) (uname
phy:zeus_lvs/cerbere_var) (node zeus_lvs/cerbere_var) (index 1))) (config (vm (n
ame cerbere) (id 3) (memory 32) (image (linux (kernel /boot/vmlinuz-2.6.10-xen0)
 (root '/dev/hdb1 ro') (ip off))) (device (pci (bus 0x00) (dev 0x0B) (func 0x0))
) (device (vbd (uname phy:zeus_lvs/cerbere_root) (dev hdb1) (mode w))) (device (
vbd (uname phy:zeus_lvs/cerbere_var) (dev hdb2) (mode w))) (device (vif (mac AA:
00:00:00:00:10) (bridge xen-br0))) (restart onreboot))))
***request> begin
xfr_err> ['xfr.err', '0']
xfr_err> <type 'str'> 0
xfr_vm_suspend> ['xfr.vm.suspend', '2']
VirqClient.virqReceived> 4
vif-bridge down vif=vif2.0 domain=cerbere mac=aa:00:00:00:00:10 bridge=xen-br0
xfr_vm_suspend>cberr> [Failure instance: Traceback: twisted.internet.defer.Timeo
utError, Callback timed out
Error> [Failure instance: Traceback: twisted.internet.defer.TimeoutError, Callba
ck timed out
Error> calling errback
***cbremove> [Failure instance: Traceback: twisted.internet.defer.TimeoutError,
Callback timed out
***_delete_session> 2
xfr_err> ['xfr.err', '1']
xfr_err> <type 'str'> 1
Error> 1
Xfrd>connectionLost> [Failure instance: Traceback: twisted.internet.error.Connec
tionDone, Connection was closed cleanly.
XfrdSaveInfo>connectionLost> [Failure instance: Traceback: twisted.internet.erro
r.ConnectionDone, Connection was closed cleanly.

After this the domain does not exist anymore and the save file is empty.

i'm hoping for your help ...

