[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] arm64: Approach for DT based NUMA and issues
On Tue, Nov 29, 2016 at 12:29 AM, Julien Grall <julien.grall@xxxxxxx> wrote: > > > On 26/11/16 06:59, Vijay Kilari wrote: >> >> Hi, > > > Hi Vijay, > > This mail is mixing two distinct problems: > 1) Making Xen NUMA-aware > 2) Make DOM0 NUMA-aware > > As mentioned in another part of this thread, those problems should be one by > one rather than together. > > I will focus on problem 1) while answering this e-mail. > > >> Below basic write up on DT based NUMA feature support for arm64 >> platform. >> I have attempted to get NUMA support, However I face below issues. I would >> like >> to discuss these issues. Please let me know your comments on this. Yet to >> look >> at ACPI support. >> >> DT based NUMA support for arm64 platform >> ======================================== >> For Xen boot on NUMA arm64 platform, Xen needs to parse >> CPU and Memory nodes for DT based booting mechanism. Here I would >> like to discuss about DT based booting mechanism and the issues >> related to it. >> >> 1) Parsing CPU and Memory nodes: >> --------------------------------------------------- >> >> The numa information associated for CPU and Memory are passed in DT >> using numa-node-id u32-interger value. More information about NUMA binding >> is available in linux kernel @ Documentation/devicetree/bindings/numa.txt >> >> Similar to Linux kernel, cpu and memory nodes of DT are parsed >> and numa-node-id information is populated in cpu_parsed and memory_parsed >> node_t mask. >> >> When booting in UEFI mode, UEFI passes memory information to Dom0 >> using EFI memory descriptor table and deletes the memory nodes >> from the host DT. However to fetch the memory numa node id, memory DT >> node should not be deleted by EFI stub. >> >> ISSUE: When memory node is _NOT_ deleted by EFI stub from host DT, >> Xen identifies the memory node [xen/arch/arm/bootfdt.c, early_scan_node() >> ] >> which adds memory ranges to bootinfo.mem structure there by adding >> duplicate >> entry and eventually initialization fails. >> >> Possible Solution: While adding new memory region to bootinfo.mem, check >> for >> duplicate entries and back off if entry is already available from UEFI mem >> info >> table. > > > I think we should have a different approach. I actually like the approach > suggested by Andre in [1]), which is if the UEFI memory mapped exists (i.e > bootinfo.mem is already filled), then DT is only used to get NUMA node > information. > >> >> 2) Parsing CPU nodes: >> --------------------------------- >> The CPU nodes are parsed to extract numa-node-id info for each cpu and >> cpu_nodemask is populated. >> >> The MPIDR register value is read for each CPU and cpu_to_node[] is >> populated. > > > To emphase here, cpu_to_node will be indexed using Xen CPUID and not MPIDR. > They can be different and Xen does not have a clue of the MPIDR except in > very few places. > >> >> 3) Parsing Memory nodes: >> -------------------------------------- >> For all the DT memory nodes in the flattend DT, start address, size >> and numa-node-id value is extracted and stored in "node_memblk_range[]" >> which is of type struct node. >> >> Each bootinfo.mem entry from UEFI is verified against node_memblk_range[] >> and >> NODE_DATA is populated with start PFN, end PFN and nodeid. >> >> Populating memnodemap: >> >> The memnodemap[] is allocated from heap and using the NODE_DATA structure, >> the memnodemap[] is populated with nodeid for each page index. >> >> This memnodemap info is used to fetch memory node id for a given page >> by calling phys_to_nid() by memory allocator. >> >> ISSUE: phys_to_nid() is called by memory allocator before memnodemap[] >> is initialized. >> >> Since memnodemap[] is allocated from heap, and hence boot allocator should >> be initialized. The boot_allocator() needs phys_to_nid() which is not >> available untill memnodemap[] is initialized. So there is deadlock >> situation >> during initialization. To overcome this phsy_to_nid() should rely on >> node_memblk_range[] to get nodeid untill memnodemap[] is initialized. > > > Looking at the code, boot_allocator() does not need phys_to_nid until the > end. So it would be perfectly fine to use alloc_boot_pages to allocate > memnodemap. > >> >> 4) Generating memory nodes for DOM0 >> --------------------------------------------------------- >> Linux kernel device drivers that uses devm_zalloc(), tries to allocate >> memory >> from local memory node. So Dom0 needs to have memory allocated on all the >> available nodes of the system. >> >> Ex: SMMU driver of device on node 1 tries to allocate memory >> on node 1. >> >> ISSUE: >> - Dom0's memory should be split across all the available memory nodes >> of the system and memory nodes should be generated accordingly. >> - Memory DT node generated by Xen for Dom0 should populate numa-node-id >> information. > > > If you drop numa-node-id property from every node, DOM0 will not try to use > NUMA. Is there any specific reason to not do that? If we drop numa-node-id from memory node generated to dom0, then dom0 will assume all the memory is from node0. So eventually node1 device intialization fails. > > Those properties could be re-introduced later on when vNUMA will be brought > up. > > Regards, > > [1] > https://lists.xenproject.org/archives/html/xen-devel/2016-11/msg02499.html > > -- > Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |