[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] oxenstored memory leak? seems related with XSA-38



Hi,

This patch works fine in my test, it can protect oxenstored from malicious 
domain(which I made), but I'm not sure whether the check logic can cover all 
conditions.

As oxenstored is a shared resource, I think it's a good idea to add some QOS 
mechanism. 

> -----Original Message-----
> From: David Scott [mailto:dave.scott@xxxxxxxxxxxxx]
> Sent: Monday, July 15, 2013 8:13 PM
> To: Liuqiming (John)
> Cc: andrew.cooper3@xxxxxxxxxx; xen-devel@xxxxxxxxxxxxx; Yanqiangjun
> Subject: Re: [Xen-devel] oxenstored memory leak? seems related with
> XSA-38
> 
> Hi,
> 
> On 05/07/13 10:07, Liuqiming (John) wrote:
> >
> > Here is my patch that try to fix this issue.
> >
> > The whole idea is: add check logic when read from IO ring, and if error
> happens mark the reading connection as "bad",
> > Unless vm reboot, oxenstored will not handle message from this
> connection any more.
> 
> I think detecting a bad client and avoiding wasting CPU time on it is a
> good idea. Is this patch working well for you in your testing?
> 
> In future I wonder whether we should add some form of request
> rate-limiting in addition to the per-domain quotas, to prevent one
> domain from taking more than it's fair share of xenstored time. I
> imagine that a non-malicious domain can still keep xenstored busy with
> 'legitimate' traffic (although hopefully it still provides services to
> other guests, albeit more slowly)
> 
> Cheers,
> Dave
> 
> >
> > diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
> b/tools/ocaml/libs/xb/xs_ring_stubs.c
> > index fdd9983..a9ca456 100644
> > --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> > +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> > @@ -45,6 +45,10 @@
> >     cons = *(volatile uint32*)&intf->req_cons;
> >     prod = *(volatile uint32*)&intf->req_prod;
> >     xen_mb();
> > +
> > +   if ((prod - cons) > XENSTORE_RING_SIZE)
> > +       return -1;
> > +
> >     if (prod == cons)
> >             return 0;
> >     cons = MASK_XENSTORE_IDX(cons);
> > @@ -94,7 +98,7 @@
> >     res = xs_ring_read(GET_C_STRUCT(interface),
> >                        String_val(buffer), Int_val(len));
> >     if (res == -1)
> > -           caml_failwith("huh");
> > +           caml_failwith("bad client");
> >     result = Val_int(res);
> >     CAMLreturn(result);
> >   }
> > diff --git a/tools/ocaml/xenstored/connection.ml
> b/tools/ocaml/xenstored/connection.ml
> > index 32e2f2e..e862c79 100644
> > --- a/tools/ocaml/xenstored/connection.ml
> > +++ b/tools/ocaml/xenstored/connection.ml
> > @@ -38,6 +38,11 @@
> >     mutable perm: Perms.Connection.t;
> >   }
> >
> > +let mark_as_bad con =
> > +   match con.dom with
> > +   | None -> ()
> > +   | Some domain -> Domain.mark_as_bad domain
> > +
> >   let get_path con =
> >   Printf.sprintf "/local/domain/%i/" (match con.dom with None -> 0 |
> Some d -> Domain.get_id d)
> >
> > diff --git a/tools/ocaml/xenstored/domain.ml
> b/tools/ocaml/xenstored/domain.ml
> > index 85ab282..444069d 100644
> > --- a/tools/ocaml/xenstored/domain.ml
> > +++ b/tools/ocaml/xenstored/domain.ml
> > @@ -27,6 +27,7 @@
> >     interface: Xenmmap.mmap_interface;
> >     eventchn: Event.t;
> >     mutable port: Xeneventchn.t option;
> > +   mutable bad_client: bool;
> >   }
> >
> >   let get_path dom = "/local/domain/" ^ (sprintf "%u" dom.id)
> > @@ -34,6 +35,9 @@
> >   let get_interface d = d.interface
> >   let get_mfn d = d.mfn
> >   let get_remote_port d = d.remote_port
> > +
> > +let is_bad_domain domain = domain.bad_client
> > +let mark_as_bad domain = domain.bad_client <- true
> >
> >   let string_of_port = function
> >   | None -> "None"
> > @@ -68,7 +72,8 @@
> >     remote_port = remote_port;
> >     interface = interface;
> >     eventchn = eventchn;
> > -   port = None
> > +   port = None;
> > +   bad_client = false
> >   }
> >
> >   let is_dom0 d = d.id = 0
> > diff --git a/tools/ocaml/xenstored/process.ml
> b/tools/ocaml/xenstored/process.ml
> > index a4ff741..2267ddc 100644
> > --- a/tools/ocaml/xenstored/process.ml
> > +++ b/tools/ocaml/xenstored/process.ml
> > @@ -374,7 +374,17 @@
> >     Logging.xb_answer ~ty ~tid ~con:(Connection.get_domstr con)
> data
> >
> >   let do_input store cons doms con =
> > -   if Connection.do_input con then (
> > +   let newpacket =
> > +           try
> > +                   Connection.do_input con
> > +           with Failure exp ->
> > +                   error "caught exception %s" exp;
> > +                   error "got a bad client %s" (sprintf "%-8s"
> (Connection.get_domstr con));
> > +                   Connection.mark_as_bad con;
> > +                   false
> > +   in
> > +
> > +   if newpacket then (
> >             let packet = Connection.pop_in con in
> >             let tid, rid, ty, data = Xenbus.Xb.Packet.unpack packet in
> >             (* As we don't log IO, do not call an unnecessary sanitize_data
> > diff --git a/tools/ocaml/xenstored/xenstored.ml
> b/tools/ocaml/xenstored/xenstored.ml
> > index 4045aed..cfc2a4b 100644
> > --- a/tools/ocaml/xenstored/xenstored.ml
> > +++ b/tools/ocaml/xenstored/xenstored.ml
> > @@ -50,9 +50,10 @@
> >
> >   let process_domains store cons domains =
> >     let do_io_domain domain =
> > -           let con = Connections.find_domain cons (Domain.get_id domain) in
> > -           Process.do_input store cons domains con;
> > -           Process.do_output store cons domains con in
> > +           if not (Domain.is_bad_domain domain) then
> > +                   let con = Connections.find_domain cons (Domain.get_id
> domain) in
> > +                   Process.do_input store cons domains con;
> > +                   Process.do_output store cons domains con in
> >     Domains.iter domains do_io_domain
> >
> >   let sigusr1_handler store =
> >
> >
> >> -----Original Message-----
> >> From: Liuqiming (John)
> >> Sent: Friday, July 05, 2013 11:14 AM
> >> To: 'andrew.cooper3@xxxxxxxxxx'
> >> Cc: Yanqiangjun; 'xen-devel@xxxxxxxxxxxxx'
> >> Subject: RE: [Xen-devel] oxenstored memory leak? seems related with
> >> XSA-38
> >>
> >> Hi Andrew,
> >>
> >> Yes, I know the second patch commit by Ian Campbell.
> >>
> >> Actually I have applied all the patch from
> >>
> http://wiki.xen.org/wiki/Security_Announcements#XSA_38_oxenstored_inc
> >> orrect_handling_of_certain_Xenbus_ring_states
> >>
> >> But this issue still exists.
> >>
> >>> On 04/07/13 03:48, Liuqiming (John) wrote:
> >>>> Hi all,
> >>>>
> >>>> Continue my test about oxenstored:
> >>>>
> >>>> I switch to original C xenstored and test my "broken" vm. The
> cxenstored
> >>> do not have the "memory leak" issue.
> >>>> So I compared the IO ring handler logic between cxenstored and
> >>> oxenstored and find out the difference:
> >>>>
> >>>> In Cxenstord, after got the cons and prod value, a index check will be
> >>> performed
> >>>>
> >>>>  if (!check_indexes(cons, prod)) {
> >>>>          errno = EIO;
> >>>>          return -1;
> >>>>  }
> >>>>
> >>>> static bool check_indexes(XENSTORE_RING_IDX cons,
> >>> XENSTORE_RING_IDX prod)
> >>>> {
> >>>>  return ((prod - cons) <= XENSTORE_RING_SIZE);
> >>>> }
> >>>>
> >>>> So any connection has prod - cons > XENSTORE_RING_SIZE will be
> >> treated
> >>> as "bad client", and cxenstored will not handle its IO ring msg any more.
> >>>>
> >>>> But in oxenstored, there is just a simple comparison between prod and
> >>> cons
> >>>>      if (prod == cons)
> >>>>          return 0;
> >>>>
> >>>> so there leaves a security hole to a guest vm user who can manipulate
> >> prod
> >>> and cons to make oxenstored increasing memory usage and out of
> service.
> >>>>
> >>>> I managed to create a patch to fix this and I'm testing it on xen4.2.2.
> Will
> >>> send out soon.
> >>>
> >>> Are you aware of
> >>>
> >>
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=eeddfad1b339dca
> >>> a787230f519a19de1cbc22ad8
> >>> which is a second patch for XSA-38 ?
> >>>
> >>> ~Andrew
> >>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Liuqiming (John)
> >>>>> Sent: Monday, July 01, 2013 9:47 PM
> >>>>> To: 'xen-devel@xxxxxxxxxxxxx'; 'ian.jackson@xxxxxxxxxxxxx';
> >>>>> 'ian.campbell@xxxxxxxxxx'
> >>>>> Cc: Yanqiangjun
> >>>>> Subject: oxenstored memory leak? seems related with XSA-38
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I test starting vm using xen-4.2.2 release with oxenstored, and got a
> >>> problem
> >>>>> may be related with XSA-38
> >>>>>
> >>>
> >>
> (http://lists.xen.org/archives/html/xen-announce/2013-02/msg00005.html).
> >>>>>
> >>>>> When vm started, oxenstored memory usage keep increasing, and it
> >> took
> >>>>> 1.5G memory at last. Vm hanged at loading OS screen.
> >>>>>
> >>>>> Here is the output of top:
> >>>>>
> >>>>> top - 20:18:32 up 1 day,  3:09,  5 users,  load average: 0.99, 0.63,
> >> 0.32
> >>>>> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0
> zombie
> >>>>> %Cpu(s):  4.5 us,  1.8 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi,  0.0 si,
> >>> 0.0
> >>>>> st
> >>>>> KiB Mem:  46919428 total, 46699012 used,   220416 free,
> 36916
> >>>>> buffers
> >>>>> KiB Swap:  2103292 total,        0 used,  2103292 free,
> 44260932
> >>>>> cached
> >>>>>
> >>>>>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> >>> TIME+
> >>>>> COMMAND
> >>>>>    806 root      20   0  955m 926m 1068 R  99.9  2.0
> 4:54.14
> >>>>> oxenstored
> >>>>>
> >>>>>
> >>>>> top - 20:19:05 up 1 day,  3:09,  5 users,  load average: 0.99, 0.67,
> >> 0.34
> >>>>> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0
> zombie
> >>>>> %Cpu(s):  4.6 us,  1.6 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si,
> >>> 0.0
> >>>>> st
> >>>>> KiB Mem:  46919428 total, 46708564 used,   210864 free,
> 36964
> >>>>> buffers
> >>>>> KiB Swap:  2103292 total,        0 used,  2103292 free,
> 44168380
> >>>>> cached
> >>>>>
> >>>>>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> >>> TIME+
> >>>>> COMMAND
> >>>>>    806 root      20   0 1048m 1.0g 1068 R 100.2  2.2
> 5:27.03
> >>>>> oxenstored
> >>>>>
> >>>>>
> >>>>>
> >>>>> top - 20:21:35 up 1 day,  3:12,  5 users,  load average: 1.00, 0.80,
> >> 0.44
> >>>>> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0
> zombie
> >>>>> %Cpu(s):  4.7 us,  1.6 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi,  0.0 si,
> >>> 0.0
> >>>>> st
> >>>>> KiB Mem:  46919428 total, 46703052 used,   216376 free,
> 37208
> >>>>> buffers
> >>>>> KiB Swap:  2103292 total,        0 used,  2103292 free,
> 43682968
> >>>>> cached
> >>>>>
> >>>>>    PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM
> >>> TIME+
> >>>>> COMMAND
> >>>>>    806 root      20   0 1551m 1.5g 1068 R 100.2  3.3
> 7:56.10
> >>>>> oxenstored
> >>>>>
> >>>>> And oxenstored log got these over and over again:
> >>>>>
> >>>>> [20130701T12:27:14.290Z] D8           invalid
> >>>>> device/suspend/event-channel
> >>>>>
> >>>>> ..
> >>>>> [20130701T12:27:14.290Z] D8.1937077039 invalid
> >>>>> /event-channel
> >>>>>
> >>>>>    ..
> >>>>> [20130701T12:27:14.290Z] D8.1852727656
> >>>>> invalid
> >>>>>
> >>>>>              ..
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>> [20130701T12:27:14.290Z] D8           debug
> >>>>>
> >>>>> My vm is a windows guest and has GPL PVDriver installed. This
> problem
> >> is
> >>>>> hard to reproduce, and after a hard reboot, everything looks normal.
> >>>>>
> >>>>> I guess it's something wrong with the xenbus IO Ring, so I investigated
> >> the
> >>>>> code:
> >>>>>
> >>>>> 1) oxenstored and xenbus in vm using a shared page to communicate
> >> with
> >>>>> each other
> >>>>>    struct xenstore_domain_interface {
> >>>>>      char req[XENSTORE_RING_SIZE]; /* Requests to xenstore
> >> daemon.
> >>> */
> >>>>>      char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch
> >> events.
> >>> */
> >>>>>      XENSTORE_RING_IDX req_cons, req_prod;
> >>>>>      XENSTORE_RING_IDX rsp_cons, rsp_prod;
> >>>>> };
> >>>>>
> >>>>> 2) xenbus in vm put request in req and increase req_prod, then send a
> >>> event
> >>>>> to oxenstored
> >>>>> 3) oxenstored calculates how many to read using req_cons and
> >> req_prod,
> >>>>> and after read oxenstored increase req_cons to make it equals
> req_prod
> >>>>> which means no request pending.
> >>>>> 4) oxenstored put responds in rsp and increase rsp_prod, then send a
> >>> event
> >>>>> to vm, xenbus in vm using similar logic to handle the rsp ring.
> >>>>>
> >>>>>   Am I correct?
> >>>>>
> >>>>> So, I'm curious about what happened when req_cons larger than
> >>> req_prod
> >>>>> (this can be caused by buggy PV Driver or malicious guest user), it
> seems
> >>>>> oxenstored will fell in a endless loop.
> >>>>>
> >>>>> Is this what XSA-38 talk about?
> >>>>>
> >>>>> I built a pvdriver which will set req_prod to 0 after several xenstore
> >>>>> operation, and test it on xen-unstable.hg make sure all XSA-38
> patches
> >>>>> applied.
> >>>>> It seems that the problem I first met reproduced. Oxenstored will took
> a
> >>> lot
> >>>>> of memory eventually.
> >>>>>
> >>>>> Could anyone help me about this issue?
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> _______________________________________________
> >>>> Xen-devel mailing list
> >>>> Xen-devel@xxxxxxxxxxxxx
> >>>> http://lists.xen.org/xen-devel
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxx
> > http://lists.xen.org/xen-devel
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.