(Orran/Jimi cc'ed, see question below...)
> > I understand and sympathize with the need for dom0 to
> > sometimes get and use information from each processor that is
> > only available if dom0 is running on each processor.
> > However, AFAIK, SMP guests are always gang-scheduled, correct?
> No, there's no need to strictly gang schedule, and the
> current scheduler makes no attempt to do so. It may generally
> be a decent thing to do, though.
> > (If not, aren't there some very knotty research issues
> > related to locking and forward progress?)
> You could end up preempting a vCPU holding a lock which could
> lead to daft behaviour of naïve spin locks. A number of
> possible workarounds have been prototyped, but since it
> doesn't seem to be much of a problem in practice nothing has
> been checked in.
I wonder if "not a problem in practice" is more of an indication
of lack of practice than lack of problem. I can see that the
problem would be unlikely to occur with small numbers of
processors and one SMP guests running a highly scalable SMP app
(such as a web server), but I'll bet a real enterprise load
of home-grown SMP apps running in a IT shop that's had big SMP
boxes for years would see the problem more quickly, especially
after multiple SMP guests are consolidated onto a single box.
I believe ppc has "paravirtualized spinlocks" in their Linux
kernel, though even this won't necessarily help with a poorly
written SMP application.
No data, admittedly, but perhaps our good buddies at
Watson could comment?
> > So on a 16-processor system, every time dom0 needs to run
> > (e.g. to handle backend I/O for any one of perhaps hundreds
> > of domains), *every* domain gets descheduled so that dom0 can
> > be (gang-)scheduled on all 16 processors?
> > If true, this sounds like a _horrible_ performance hit, so I
> > hope I'm misunderstanding something...
> This isn't an issue.
> After booting you probably want dom0 to give up all but 1 vCPU anyway.
Unless of course the PCPU's have data that change over time, such
as variable cycle rate (for power management) or hot-plug memory...
Xen-devel mailing list