[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] error handling in libxl_domain_suspend
Ian, Wei, we got a report about a crash from libxl_domain_suspend like this, from 'virsh migrate --live xen+ssh://host': #1 helper_done (egc=0x7fc0284aa6c0, shs=0x7fc0180256c8) at libxl_save_callout.c:371 helper_failed helper_stop libxl__save_helper_abort #2 check_all_finished (egc=0x7fc0284aa6c0, stream=0x7fc018025698, rc=-3) at libxl_stream_write.c:671 stream_done stream_complete write_done dc->callback == write_done efd->func == datacopier_writable #3 afterpoll_internal (egc=egc@entry=0x7fc0284aa6c0, poller=poller@entry=0x7fc018003f20, nfds=4, fds=0x7fc018002d00, now=...) at libxl_event.c:1269 I inserted the extra call trace manually for better understanding. The issue is a failed poll will crash libxl, the actual error was: libxl_aoutils.c:328:datacopier_writable: unexpected poll event 0x1c on fd 37 (should be POLLOUT) writing libxc header during copy of save v2 stream In this case revents in datacopier_writable is POLLHUP|POLLERR|POLLOUT, which triggers datacopier_callback. In helper_done, shs->completion_callback is still zero: (gdb) p stream.shs $32 = {ao = 0x7f3fa4002d10, domid = 0, callbacks = { save = {a = {suspend = 0x7f3f99c8e220 <libxl__domain_suspend_callback>, postcopy = 0x0, checkpoint = 0x0, wait_checkpoint = 0x0, switch_qemu_logdirty = 0x7f3f99c8eca0 <libxl__domain_suspend_common_switch_qemu_logdirty>}}, restore = {a = {suspend = 0x7f3f99c8e220 <libxl__domain_suspend_callback>, postcopy = 0x0, checkpoint = 0x0, wait_checkpoint = 0x0, restore_results = 0x7f3f99c8eca0 <libxl__domain_suspend_common_switch_qemu_logdirty>}}}, recv_callback = 0x0, completion_callback = 0x0, caller_state = 0x0, need_results = 0, rc = 0, completed = 0, retval = 0, errnoval = 0, abrt = {ao = 0x0, callback = 0x0, registered = false, entry = { le_next = 0x0, le_prev = 0x0}}, pipes = {0x0, 0x0}, readable = {fd = -1, events = 0, func = 0x0, entry = {le_next = 0x0, le_prev = 0x0}, nexus = 0x0}, child = {pid = -1, callback = 0x0, entry = {le_next = 0x0, le_prev = 0x0}}, stdin_what = 0x0, stdout_what = 0x0, egc = 0x0} Even if helper_done would check if shs->completion_callback is valid, check_all_finished would apparently cycle forever: (gdb) p stream.completion_callback $35 = (void (*)(libxl__egc *, libxl__stream_write_state *, int)) 0x7f3f99c8e890 <stream_done> stream_done would call check_all_finished again. My understanding of the code is that libxl__xc_domain_save fills dss.sws.shs. But that function is only called after stream_header_done. Any error before that will leave dss partly uninitialized. How is this supposed to be fixed? Olaf Attachment:
pgptnq8Yp_zHA.pgp _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |