Xen project Mailing List

Re: [Xen-devel] NUMA TODO-list for xen-devel

To: Dario Faggioli <raistlin@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

From: "Zhang, Yang Z" <yang.z.zhang@xxxxxxxxx>

Date: Thu, 2 Aug 2012 01:04:54 +0000

Accept-language: en-US

Cc: Andre Przywara <andre.przywara@xxxxxxx>, Anil Madhavapeddy <anil@xxxxxxxxxx>, George Dunlap <dunlapg@xxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>

Delivery-date: Thu, 02 Aug 2012 01:05:58 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHNcAER/p/FU/fZGkSxhN/UzIuHTZdFsI5g

Thread-topic: NUMA TODO-list for xen-devel

Dario Faggioli wrote on 2012-08-02: > Hi everyone, > > With automatic placement finally landing into xen-unstable, I stated > thinking about what I could work on next, still in the field of > improving Xen's NUMA support. Well, it turned out that running out of > things to do is not an option! :-O > > In fact, I can think of quite a bit of open issues in that area, that I'm > just braindumping here. If anyone has thoughts or idea or feedback or > whatever, I'd be happy to serve as a collector of them. I've already > created a Wiki page to help with the tracking. You can see it here > (for now it basically replicates this e-mail): > > http://wiki.xen.org/wiki/Xen_NUMA_Roadmap > I'm putting a [D] (standing for Dario) near the points I've started > working on or looking at, and again, I'd be happy to try tracking this > too, i.e., keeping the list of "who-is-doing-what" updated, in order to > ease collaboration. > > So, let's cut the talking: > > - Automatic placement at guest creation time. Basics are there and > will be shipping with 4.2. However, a lot of other things are > missing and/or can be improved, for instance: > [D] * automated verification and testing of the placement; > * benchmarks and improvements of the placement heuristic; > [D] * choosing/building up some measure of node load (more accurate > than just counting vcpus) onto which to rely during placement; > * consider IONUMA during placement; We should consider two things: 1. Dom0 IONUMA: Devices used by dom0 should get the dma buffer from the node which it resides. Currently, Dom0 allocates dma buffer without provide the node info to the hypercall.. 2.Guest IONUMA: when guest boots up with pass through device, we need to allocate the memory from the node where the device resides for further dma buffer allocation. And let guest know the IONUMA topology. This rely on the guest NUMA. This topic was mentioned in xen summit 2011: http://xen.org/files/xensummit_seoul11/nov2/5_XSAsia11_KTian_IO_Scalability_in_Xen.pdf > * automatic placement of Dom0, if possible (my current series is > only affecting DomU) * having internal xen data structure > honour the placement (e.g., I've been told that right now vcpu > stacks are always allocated on node 0... Andrew?). > [D] - NUMA aware scheduling in Xen. Don't pin vcpus on nodes' pcpus, > just have them _prefer_ running on the nodes where their memory > is. > [D] - Dynamic memory migration between different nodes of the host. As > the counter-part of the NUMA-aware scheduler. > - Virtual NUMA topology exposure to guests (a.k.a guest-numa). If a > guest ends up on more than one nodes, make sure it knows it's > running on a NUMA platform (smaller than the actual host, but > still NUMA). This interacts with some of the above points: > * consider this during automatic placement for > resuming/migrating domains (if they have a virtual topology, > better not to change it); * consider this during memory > migration (it can change the actual topology, should we update > it on-line or disable memory migration?) > - NUMA and ballooning and memory sharing. In some more details: > * page sharing on NUMA boxes: it's probably sane to make it > possible disabling sharing pages across nodes; * ballooning and > its interaction with placement (races, amount of memory needed > and reported being different at different time, etc.). > - Inter-VM dependencies and communication issues. If a workload is > made up of more than just a VM and they all share the same (NUMA) > host, it might be best to have them sharing the nodes as much as > possible, or perhaps do right the opposite, depending on the > specific characteristics of he workload itself, and this might be > considered during placement, memory migration and perhaps > scheduling. > - Benchmarking and performances evaluation in general. Meaning both > agreeing on a (set of) relevant workload(s) and on how to extract > meaningful performances data from there (and maybe how to do that > automatically?). > So, what do you think? > > Thanks and Regards, > Dario > > -- <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- Dario > Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software > Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > > > -- <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- Dario > Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software > Engineer, Citrix Systems R&D Ltd., Cambridge (UK) Best regards, Yang _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.