[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal



> -----Original Message-----
> From: David Scott [mailto:dave.scott@xxxxxxxxxx]
> Sent: 25 June 2014 22:15
> To: xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Paul Durrant; Ian Campbell; john.liuqiming@xxxxxxxxxx;
> yanqiangjun@xxxxxxxxxx; Dave Scott
> Subject: [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal
> 
> Currently hvmloader uses the xenstore ring and then tries to
> reset it back to its initial state. This is not part of the
> ring protocol and, if xenstored reads the ring while it is
> happening, xenstored will conclude it is corrupted. A corrupted
> ring will prevent PV drivers from connecting. This seems to
> be a rare failure.
> 
> Furthermore, when a VM crashes it may jump to a 'crash kernel'
> to create a diagnostic dump. Without the ability to safely
> reset the ring the PV drivers won't be able to reliably
> establish connections, to (for example) stream a memory dump to
> disk.
> 
> This prototype patch contains a simple extension of the
> xenstore ring structure, enough to contain version numbers and
> a 'closing' flag. This memory is currently unused within the
> 4k page and should be zeroed as part of the domain setup
> process. The oxenstored server advertises version 1, and
> hvmloader detects this and sets the 'closing' flag. The server
> then reinitialises the ring, filling it with obviously invalid
> data to help debugging (unfortunately blocks of zeroes are
> valid xenstore packets) and signals hvmloader by the event
> channel that it's safe to boot the guest OS.
> 
> This patch has only been lightly tested. I'd appreciate
> feedback on the approach before going further.
> 
> Signed-off-by: David Scott <dave.scott@xxxxxxxxxx>
> ---
>  tools/firmware/hvmloader/xenbus.c   |   16 +++++--
>  tools/ocaml/libs/xb/xb.ml           |   26 ++++++++++-
>  tools/ocaml/libs/xb/xb.mli          |    3 +-
>  tools/ocaml/libs/xb/xs_ring.ml      |   13 ++++++
>  tools/ocaml/libs/xb/xs_ring_stubs.c |   81
> ++++++++++++++++++++++++++++++++++-
>  xen/include/public/io/xs_wire.h     |    8 ++++
>  6 files changed, 138 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/xenbus.c
> b/tools/firmware/hvmloader/xenbus.c
> index fe72e97..15d961b 100644
> --- a/tools/firmware/hvmloader/xenbus.c
> +++ b/tools/firmware/hvmloader/xenbus.c
> @@ -37,6 +37,8 @@ static struct xenstore_domain_interface *rings; /*
> Shared ring with dom0 */
>  static evtchn_port_t event;                     /* Event-channel to dom0 */
>  static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area
> */
> 
> +static void ring_wait(void);
> +

Personally, I'd move the definition of ring_wait() up rather than forward 
declaring it.

>  /* Connect our xenbus client to the backend.
>   * Call once, before any other xenbus actions. */
>  void xenbus_setup(void)
> @@ -68,10 +70,16 @@ void xenbus_shutdown(void)
> 
>      ASSERT(rings != NULL);
> 
> -    /* We zero out the whole ring -- the backend can handle this, and it's
> -     * not going to surprise any frontends since it's equivalent to never
> -     * having used the rings. */
> -    memset(rings, 0, sizeof *rings);
> +    if (rings->server_version > XENSTORE_VERSION_0) {

I notice in the changes to xs_wire.h below you have a client_version field. 
Should that not be set here too?

> +        rings->closing = 1;
> +        while (rings->closing == 1)
> +            ring_wait ();
> +    } else {
> +        /* If the backend reads the state while we're erasing it then the
> +           ring state will become corrupted, preventing guest frontends from
> +           connecting. This is rare. */
> +        memset(rings, 0, sizeof *rings);
> +    }
> 
>      /* Clear the event-channel state too. */
>      memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
> index 29d354d..d5cd776 100644
> --- a/tools/ocaml/libs/xb/xb.ml
> +++ b/tools/ocaml/libs/xb/xb.ml
> @@ -84,7 +84,26 @@ let write con s len =
>       | Fd backfd     -> write_fd backfd con s len
>       | Xenmmap backmmap -> write_mmap backmmap con s len
> 
> -let output con =
> +(* If a function throws Xs_ring.Closing, then clear the ring state
> +   and serve the ring again. *)
> +let rec handle_closing f con =
> +     match (try Some (f con) with Xs_ring.Closing -> None) with
> +     | Some x -> x
> +     | None ->
> +             begin match con.backend with
> +             | Fd _ -> raise Xs_ring.Closing (* should never happen, but
> just in case *)
> +             | Xenmmap backend ->
> +                     Xs_ring.close backend.mmap;
> +                     backend.eventchn_notify ();
> +                     (* Clear our old connection state *)
> +                     Queue.clear con.pkt_in;
> +                     Queue.clear con.pkt_out;
> +                     con.partial_in <- init_partial_in ();
> +                     con.partial_out <- "";
> +                     handle_closing f con
> +             end
> +
> +let output = handle_closing (fun con ->
>       (* get the output string from a string_of(packet) or partial_out *)
>       let s = if String.length con.partial_out > 0 then
>                       con.partial_out
> @@ -101,8 +120,9 @@ let output con =
>       );
>       (* after sending one packet, partial is empty *)
>       con.partial_out = ""
> +)
> 
> -let input con =
> +let input = handle_closing (fun con ->
>       let newpacket = ref false in
>       let to_read =
>               match con.partial_in with
> @@ -133,6 +153,7 @@ let input con =
>                       HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
>       );
>       !newpacket
> +)
> 
>  let newcon backend = {
>       backend = backend;
> @@ -145,6 +166,7 @@ let newcon backend = {
>  let open_fd fd = newcon (Fd { fd = fd; })
> 
>  let open_mmap mmap notifyfct =
> +     Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *)
>       newcon (Xenmmap {
>               mmap = mmap;
>               eventchn_notify = notifyfct;
> diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
> index 58234ae..af7ea5c 100644
> --- a/tools/ocaml/libs/xb/xb.mli
> +++ b/tools/ocaml/libs/xb/xb.mli
> @@ -23,7 +23,7 @@ module Op :
>        | Resume
>        | Set_target
>        | Restrict
> -      | Invalid (* Not a valid wire operation *)
> +      | Invalid
>      val operation_c_mapping : operation array
>      val size : int
>      val array_search : 'a -> 'a array -> int
> @@ -80,6 +80,7 @@ val read : t -> string -> int -> int
>  val write_fd : backend_fd -> 'a -> string -> int -> int
>  val write_mmap : backend_mmap -> 'a -> string -> int -> int
>  val write : t -> string -> int -> int
> +val handle_closing : (t -> 'a) -> t -> 'a
>  val output : t -> bool
>  val input : t -> bool
>  val newcon : backend -> t
> diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
> index 9469609..dadf9e1 100644
> --- a/tools/ocaml/libs/xb/xs_ring.ml
> +++ b/tools/ocaml/libs/xb/xs_ring.ml
> @@ -14,5 +14,18 @@
>   * GNU Lesser General Public License for more details.
>   *)
> 
> +exception Closing
> +
> +let _ =
> +  Callback.register_exception "Xs_ring.Closing" Closing
> +
>  external read: Xenmmap.mmap_interface -> string -> int -> int =
> "ml_interface_read"
>  external write: Xenmmap.mmap_interface -> string -> int -> int =
> "ml_interface_write"
> +
> +external set_client_version: Xenmmap.mmap_interface -> int -> unit =
> "ml_interface_set_client_version" "noalloc"
> +external get_client_version: Xenmmap.mmap_interface -> int =
> "ml_interface_get_client_version" "noalloc"
> +
> +external set_server_version: Xenmmap.mmap_interface -> int -> unit =
> "ml_interface_set_server_version" "noalloc"
> +external get_server_version: Xenmmap.mmap_interface -> int =
> "ml_interface_get_server_version" "noalloc"
> +
> +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close"
> "noalloc"
> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
> b/tools/ocaml/libs/xb/xs_ring_stubs.c
> index 8bd1047..4ddf5a7 100644
> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> @@ -35,19 +35,28 @@
> 
>  #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
> 
> +#define ERROR_UNKNOWN (-1)
> +#define ERROR_CLOSING (-2)
> +
>  static int xs_ring_read(struct mmap_interface *interface,
>                               char *buffer, int len)
>  {
>       struct xenstore_domain_interface *intf = interface->addr;
>       XENSTORE_RING_IDX cons, prod; /* offsets only */
>       int to_read;
> +        uint32_t closing;
> 
>       cons = *(volatile uint32*)&intf->req_cons;
>       prod = *(volatile uint32*)&intf->req_prod;
> +     closing = *(volatile uint32*)&intf->closing;
> +
> +     if (closing == 1)

Do you really mean if == 1 here? It's a flag, so != 0 would seem better. Also, 
should you test the client version?

> +             return ERROR_CLOSING;
> +
>       xen_mb();
> 
>       if ((prod - cons) > XENSTORE_RING_SIZE)
> -         return -1;
> +         return ERROR_UNKNOWN;
> 
>       if (prod == cons)
>               return 0;
> @@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface
> *interface,
>       struct xenstore_domain_interface *intf = interface->addr;
>       XENSTORE_RING_IDX cons, prod;
>       int can_write;
> +     uint32_t closing;
> 
>       cons = *(volatile uint32*)&intf->rsp_cons;
>       prod = *(volatile uint32*)&intf->rsp_prod;
> +     closing = *(volatile uint32*)&intf->closing;
> +
> +     if (closing == 1)

Same again.

> +             return ERROR_CLOSING;
> +
>       xen_mb();
>       if ( (prod - cons) >= XENSTORE_RING_SIZE )
>               return 0;
> @@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface,
> value buffer, value len)
> 
>       res = xs_ring_read(GET_C_STRUCT(interface),
>                          String_val(buffer), Int_val(len));
> -     if (res == -1)
> +     if (res == ERROR_UNKNOWN)
>               caml_failwith("bad connection");
> +
> +     if (res == ERROR_CLOSING)
> +
>       caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
> +
>       result = Val_int(res);
>       CAMLreturn(result);
>  }
> @@ -111,6 +130,64 @@ CAMLprim value ml_interface_write(value interface,
> value buffer, value len)
> 
>       res = xs_ring_write(GET_C_STRUCT(interface),
>                           String_val(buffer), Int_val(len));
> +
> +     if (res == ERROR_CLOSING)
> +
>       caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
> +
>       result = Val_int(res);
>       CAMLreturn(result);
>  }
> +
> +CAMLprim value ml_interface_set_client_version(value interface, value v)
> +{
> +     CAMLparam2(interface, v);
> +     struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> +     intf->client_version = Int_val(v);

Why would the server ever set the client version?

> +
> +     CAMLreturn(Val_unit);
> +}
> +
> +CAMLprim value ml_interface_get_client_version(value interface)
> +{
> +     CAMLparam1(interface);
> +     struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> +     CAMLreturn(Val_int (intf->client_version));
> +}
> +
> +CAMLprim value ml_interface_set_server_version(value interface, value v)
> +{
> +     CAMLparam2(interface, v);
> +     struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> +     intf->server_version = Int_val(v);
> +
> +     CAMLreturn(Val_unit);
> +}
> +
> +CAMLprim value ml_interface_get_server_version(value interface)
> +{
> +     CAMLparam1(interface);
> +     struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> +     CAMLreturn(Val_int (intf->server_version));
> +}
> +
> +CAMLprim value ml_interface_close(value interface)
> +{
> +     CAMLparam1(interface);
> +     const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
> +     struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +     int i;
> +
> +     intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod =
> 0;
> +     /* Ensure the unused space is full of invalid xenstore packets. */
> +     for (i = 0; i < XENSTORE_RING_SIZE; i++) {
> +             intf->req[i] = invalid_data[i % sizeof(invalid_data)];
> +             intf->rsp[i] = invalid_data[i % sizeof(invalid_data)];
> +     }
> +     xen_mb ();
> +     intf->closing = 0;
> +     CAMLreturn(Val_unit);
> +}
> diff --git a/xen/include/public/io/xs_wire.h
> b/xen/include/public/io/xs_wire.h
> index 585f0c8..68460cc 100644
> --- a/xen/include/public/io/xs_wire.h
> +++ b/xen/include/public/io/xs_wire.h
> @@ -104,6 +104,9 @@ enum xs_watch_type
>      XS_WATCH_TOKEN
>  };
> 
> +#define XENSTORE_VERSION_0 0
> +#define XENSTORE_VERSION_1 1
> +
>  /*
>   * `incontents 150 xenstore_struct XenStore wire protocol.
>   *
> @@ -112,10 +115,15 @@ enum xs_watch_type
>  typedef uint32_t XENSTORE_RING_IDX;
>  #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
>  struct xenstore_domain_interface {
> +    /* XENSTORE_VERSION_0 */
>      char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>      char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>      XENSTORE_RING_IDX req_cons, req_prod;
>      XENSTORE_RING_IDX rsp_cons, rsp_prod;
> +    uint32_t client_version;
> +    uint32_t server_version;

Should any values below perhaps be part of a union so that future versions may 
replace them, or do you intend that future versions should only extend the set 
of fields?

  Paul

> +    /* XENSTORE_VERSION_1 */
> +    uint32_t closing;
>  };
> 
>  /* Violating this is very bad.  See docs/misc/xenstore.txt. */
> --
> 1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.