[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Possible error restoring machine

CCiong the Remus maintainer since all this non-blocking stuff is for

On Wed, 2012-05-23 at 10:39 +0100, Frediano Ziglio wrote:
> I noted a possible problem restoring a machine.
> In xc_domain_restore (xc_domain_restore.c) if it's not the last
> checkpoint we set O_NONBLOCK flag (search for fcntl) that we can call
> pagebuf_get or just load other pages (see following "goto loadpages;"
> line).
> Now we could ending up calling xc_tmem_restore/xc_tmem_restore_extra
> (xc_tmem.c) which call read_extract (xc_private.c) on the same non
> blocking socket/file

There's a bunch of such places in that function, the RDEXACT macro is
also == rdexact except on Minios.

>  but read_extract does not handle EAGAIN/EWOULDBLOCK
> (both can be returned on non blocking socket depending on file type and
> Unix/Linux version) leading to a failure.
> Does this make sense or is it impossible ??

Isn't this what the if line:
        len = read(fd, buf + offset, size - offset);
        if ( (len == -1) && ((errno == EINTR) || (errno == EAGAIN)) )

is doing?

> Also note that rdexact (xc_domain_restore.c) handle data timeout but we
> can still block in read_exact called by
> xc_tmem_restore/xc_tmem_restore_extra.

Oh, wait! read_exact != rdexact -- ouch! Those are confusingly similar!

I suspect we need to pull the xc_tmem_{save,restore} into the
appropriate file and use the non-blocking capable versions or to export
the non-blocking function, with an improved name, so it can be used from

Shriram, any thoughts?

> Last note on rdexact, isn't 1 second (HEARTBEAT_MS) too small if there
> are network problems?
> Frediano
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.