Xen project Mailing List

Re: [Xen-devel] [Hackathon minutes] PV frontends/backends and NUMA machines

To: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>

From: George Dunlap <dunlapg@xxxxxxxxx>

Date: Mon, 20 May 2013 14:48:50 +0100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 20 May 2013 13:49:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, May 20, 2013 at 2:44 PM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote: > Hi all, > these are my notes from the discussion that we had at the Hackathon > regarding PV frontends and backends running on NUMA machines. > > > --- > > The problem: how can we make sure that frontends and backends run in the > same NUMA node? > > We would need to run one backend kthread per NUMA node: we have already > one kthread per netback vif (one per guest), we could pin each of them > on a different NUMA node, the same one the frontend is running on. > > But that means that dom0 would be running on several NUMA nodes at once, > how much of a performance penalty would that be? > We would need to export NUMA information to dom0, so that dom0 can make > smart decisions on memory allocations and we would also need to allocate > memory for dom0 from multiple nodes. > > We need a way to automatically allocate the initial dom0 memory in Xen > in a NUMA-aware way and we need Xen to automatically create one dom0 vcpu > per NUMA node. > > After dom0 boots, the toolstack is going to decide where to place any > new guests: it allocates the memory from the NUMA node it wants to run > the guest on and it is going to ask dom0 to allocate the kthread from > that node too. (Maybe writing the NUMA node on xenstore.) > > We need to make sure that the interrupts/MSIs coming from the NIC arrive > on the same pcpu that is running the vcpu that needs to receive it. > We need to do irqbalacing in dom0, then Xen automatically will make the > physical MSIs follow the vcpu automatically. > > If the card is multiqueue we need to make sure that we use the multiple > queues so that we can have difference sources of interrupts/MSIs for > each vif. This allows us to independently notify each dom0 vcpu. So the work items I remember are as follows: 1. Implement NUMA affinity for vcpus 2. Implement Guest NUMA support for PV guests 3. Teach Xen how to make a sensible NUMA allocation layout for dom0 4. Teach the toolstack to pin the netback threads to dom0 vcpus running on the correct node (s) Dario will do #1. I volunteered to take a stab at #2 and #3. #4 we should be able to do independently of 2 and 3 -- it should give a slight performance improvement due to cache proximity even if dom0 memory is striped across the nodes. Does someone want to volunteer to take a look at #4? I suspect that the core technical implementation will be simple, but getting a stable design that everyone is happy with for the future will take a significant number of iterations. Learn from my fail w/ USB hot-plug in 4.3, and start the design process early. :-) -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.