[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen/arm: Virtual ITS command queue handling



On Fri, 2015-05-15 at 16:56 +0530, Vijay Kilari wrote:
> On Fri, May 15, 2015 at 4:29 PM, Ian Campbell <ian.campbell@xxxxxxxxxx> wrote:
> > On Wed, 2015-05-13 at 15:26 +0100, Julien Grall wrote:
> >> >>>   on that vits;
> >> >>> * On receipt of an interrupt notification arising from Xen's own use
> >> >>>   of `INT`; (see discussion under Completion)
> >> >>> * On any interrupt injection arising from a guests use of the `INT`
> >> >>>   command; (XXX perhaps, see discussion under Completion)
> >> >>
> >> >> With all the solution suggested, it will be very likely that we will try
> >> >> to execute multiple the scheduling pass at the same time.
> >> >>
> >> >> One way is to wait, until the previous pass as finished. But that would
> >> >> mean that the scheduler would be executed very often.
> >> >>
> >> >> Or maybe you plan to offload the scheduler in a softirq?
> >> >
> >> > Good point.
> >> >
> >> > A soft irq might be one solution, but it is problematic during emulation
> >> > of `CREADR`, when we would like to do a pass immediately to complete any
> >> > operations outstanding for the domain doing the read.
> >> >
> >> > Or just using spin_try_lock and not bothering if one is already in
> >> > progress might be another. But has similar problems.
> >> >
> >> > Or we could defer only scheduling from `INT` (either guest or Xen's own)
> >> > to a softirq but do ones from `CREADR` emulation synchronously? The
> >> > softirq would be run on return from the interrupt handler but multiple
> >> > such would be coalesced I think?
> >>
> >> I think we could defer the scheduling to a softirq for CREADR too, if
> >> the guest is using:
> >>       - INT completion: vits.creadr would have been correctly update when
> >> receiving the INT in xen.
> >>       - polling completion: the guest will loop on CREADR. It will likely 
> >> get
> >> the info on the next read. The drawback is the guest may loose few
> >> instructions cycle.
> >>
> >> Overall, I don't think it's necessary to have an accurate CREADR.
> >
> > Yes, deferring the update by one exit+enter might be tolerable. I added
> > after this list:
> >         This may result in lots of contention on the scheduler
> >         locking. Therefore we consider that in each case all which happens 
> > is
> >         triggering of a softirq which will be processed on return to guest,
> >         and just once even for multiple events. The is considered OK for the
> >         `CREADR` case because at worst the value read will be one cycle out 
> > of
> >         date.
> >
> >
> >
> >>
> >> [..]
> >>
> >> >> AFAIU the process suggested, Xen will inject small batch as long as the
> >> >> physical command queue is not full.
> >> >
> >> >> Let's take a simple case, only a single domain is using vITS on the
> >> >> platform. If it injects a huge number of commands, Xen will split it
> >> >> with lots of small batch. All batch will be injected in the same pass as
> >> >> long as it fits in the physical command queue. Am I correct?
> >> >
> >> > That's how it is currently written, yes. With the "possible
> >> > simplification" above the answer is no, only a batch at a time would be
> >> > written for each guest.
> >> >
> >> > BTW, it doesn't have to be a single guest, the sum total of the
> >> > injections across all guests could also take a similar amount of time.
> >> > Is that a concern?
> >>
> >> Yes, the example with only a guest was easier to explain.
> >
> > So as well as limiting the number of commands in each domains batch we
> > also want to limit the total number of batches?
> >
> >> >> I think we have to restrict total number of batch (i.e for all the
> >> >> domain) injected in a same scheduling pass.
> >> >>
> >> >> I would even tend to allow only one in flight batch per domain. That
> >> >> would limit the possible problem I pointed out.
> >> >
> >> > This is the "possible simplification" I think. Since it simplifies other
> >> > things (I think) as well as addressing this issue I think it might be a
> >> > good idea.
> >>
> >> With the limitation of command send per batch, would the fairness you
> >> were talking on the design doc still required?
> >
> > I think we still want to schedule the guest's in a strict round robin
> > manner, to avoid one guest monopolising things.
> >
> >> >>> Therefore it is proposed that the restriction that a single vITS maps
> >> >>> to one pITS be retained. If a guest requires access to devices
> >> >>> associated with multiple pITSs then multiple vITS should be
> >> >>> configured.
> >> >>
> >> >> Having multiple vITS per domain brings other issues:
> >> >>    - How do you know the number of ITS to describe in the device tree 
> >> >> at boot?
> >> >
> >> > I'm not sure. I don't think 1 vs N is very different from the question
> >> > of 0 vs 1 though, somehow the tools need to know about the pITS setup.
> >>
> >> I don't see why the tools would require to know the pITS setup.
> >
> > Even with only a single vits the tools need to know if the system has 0,
> > 1, or more pits, to know whether to vreate a vits at all or not.
> >
> >> >>    - How do you tell to the guest that the PCI device is mapped to a
> >> >> specific vITS?
> >> >
> >> > Device Tree or IORT, just like on native and just like we'd have to tell
> >> > the guest about that mapping even if there was a single vITS.
> >>
> >> Right, although the root controller can only be attached to one ITS.
> >>
> >> It will be necessary to have multiple root controller in the guest in
> >> the case of we passthrough devices using different ITS.
> >>
> >> Is pci-back able to expose multiple root controller?
> >
> > In principal the xenstore protocol supports it, but AFAIK all toolstacks
> > have only every used "bus" 0, so I wouldn't be surprised if there were
> > bugs lurking.
> >
> > But we could fix those, I don't think it is a requirement that this
> > stuff suddenly springs into life on ARM even with existing kernels.
> >
> >> > I think the complexity of having one vITS target multiple pITSs is going
> >> > to be quite high in terms of data structures and the amount of
> >> > thinking/tracking scheduler code will have to do, mostly down to out of
> >> > order completion of things put in the pITS queue.
> >>
> >> I understand the complexity, but exposing on vITS per pITS means that we
> >> are exposing the underlying hardware to the guest.
> >
> > Some aspect of it, yes, but it is still a virtual ITs.
> >
> >> That bring a lot of complexity in the guest layout, which is right now
> >> static. How do you decide the number of vITS/root controller exposed
> >> (think about PCI hotplug)?
> >>
> >> Given that PCI passthrough doesn't allow migration, maybe we could use
> >> the layout of the hardware.
> >
> > That's an option.
> >
> >> If we are going to expose multiple vITS to the guest, we should only use
> >> vITS for guest using PCI passthrough. This is because migration won't be
> >> compatible with it.
> >
> > It would be possible to support one s/w only vits for migration, i.e the
> > evtchn thing at the end, but for the general case that is correct. On
> > x86 I believe that if you hot unplug all passthrough devices you can
> > migrate and then plug in other devices at the other end.
> >
> > Anyway, more generally there are certainly problems with multiple vITS.
> > However there are also problems with a single vITS feeding multiple
> > pITSs:
> >
> >       * What to do with global commands? Inject to all pITS and then
> >         synchronise on them all finishing.
> >       * Handling of out of order completion of commands queued with
> >         different pITS, since the vITS must appear to complete in order.
> >         Apart from the book keeping question it makes scheduling more
> >         interesting:
> >               * What if you have a pITS with slots available, and the
> >                 guest command queue contains commands which could go to
> >                 the pITS, but behind ones which are targetting another
> >                 pITS which has no slots
> >               * What if one pITS is very busy and another is mostly idle
> >                 and a guest submits one command to the busy one
> >                 (contending with other guest) followed by a load of
> >                 commands targeting the idle one. Those commands would be
> >                 held up in this situation.
> >               * Reasoning about fairness may be harder.
> >
> > I've but both your list and mine into the next revision of the document.
> > I think this remains an important open question.
> >
> 
> Handling of Single vITS and multipl pITS can be made simple.
> 
> All ITS commands except SYNC & INVALL has device id which will
> help us to know to which pITS it should be sent.
> 
> SYNC & INVALL can be dropped by Xen on Guest request
>  and let Xen append where ever SYNC & INVALL is required.
> (Ex; Linux driver adds SYNC for required commands).
> With this assumption, all ITS commands are mapped to pITS
> and no need of synchronization across pITS

You've ignored the second bullet its three sub-bullets, I think.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.