[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] [PATCH RFC] mm/memory_hotplug: Introduce memory block types
- To: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, linux-mm@xxxxxxxxx
- From: David Hildenbrand <david@xxxxxxxxxx>
- Date: Mon, 1 Oct 2018 11:13:43 +0200
- Autocrypt: addr=david@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwX4EEwECACgFAljj9eoCGwMFCQlmAYAGCwkI BwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEE3eEPcA/4Na5IIP/3T/FIQMxIfNzZshIq687qgG 8UbspuE/YSUDdv7r5szYTK6KPTlqN8NAcSfheywbuYD9A4ZeSBWD3/NAVUdrCaRP2IvFyELj xoMvfJccbq45BxzgEspg/bVahNbyuBpLBVjVWwRtFCUEXkyazksSv8pdTMAs9IucChvFmmq3 jJ2vlaz9lYt/lxN246fIVceckPMiUveimngvXZw21VOAhfQ+/sofXF8JCFv2mFcBDoa7eYob s0FLpmqFaeNRHAlzMWgSsP80qx5nWWEvRLdKWi533N2vC/EyunN3HcBwVrXH4hxRBMco3jvM m8VKLKao9wKj82qSivUnkPIwsAGNPdFoPbgghCQiBjBe6A75Z2xHFrzo7t1jg7nQfIyNC7ez MZBJ59sqA9EDMEJPlLNIeJmqslXPjmMFnE7Mby/+335WJYDulsRybN+W5rLT5aMvhC6x6POK z55fMNKrMASCzBJum2Fwjf/VnuGRYkhKCqqZ8gJ3OvmR50tInDV2jZ1DQgc3i550T5JDpToh dPBxZocIhzg+MBSRDXcJmHOx/7nQm3iQ6iLuwmXsRC6f5FbFefk9EjuTKcLMvBsEx+2DEx0E UnmJ4hVg7u1PQ+2Oy+Lh/opK/BDiqlQ8Pz2jiXv5xkECvr/3Sv59hlOCZMOaiLTTjtOIU7Tq 7ut6OL64oAq+zsFNBFXLn5EBEADn1959INH2cwYJv0tsxf5MUCghCj/CA/lc/LMthqQ773ga uB9mN+F1rE9cyyXb6jyOGn+GUjMbnq1o121Vm0+neKHUCBtHyseBfDXHA6m4B3mUTWo13nid 0e4AM71r0DS8+KYh6zvweLX/LL5kQS9GQeT+QNroXcC1NzWbitts6TZ+IrPOwT1hfB4WNC+X 2n4AzDqp3+ILiVST2DT4VBc11Gz6jijpC/KI5Al8ZDhRwG47LUiuQmt3yqrmN63V9wzaPhC+ xbwIsNZlLUvuRnmBPkTJwwrFRZvwu5GPHNndBjVpAfaSTOfppyKBTccu2AXJXWAE1Xjh6GOC 8mlFjZwLxWFqdPHR1n2aPVgoiTLk34LR/bXO+e0GpzFXT7enwyvFFFyAS0Nk1q/7EChPcbRb hJqEBpRNZemxmg55zC3GLvgLKd5A09MOM2BrMea+l0FUR+PuTenh2YmnmLRTro6eZ/qYwWkC u8FFIw4pT0OUDMyLgi+GI1aMpVogTZJ70FgV0pUAlpmrzk/bLbRkF3TwgucpyPtcpmQtTkWS gDS50QG9DR/1As3LLLcNkwJBZzBG6PWbvcOyrwMQUF1nl4SSPV0LLH63+BrrHasfJzxKXzqg rW28CTAE2x8qi7e/6M/+XXhrsMYG+uaViM7n2je3qKe7ofum3s4vq7oFCPsOgwARAQABwsFl BBgBAgAPBQJVy5+RAhsMBQkJZgGAAAoJEE3eEPcA/4NagOsP/jPoIBb/iXVbM+fmSHOjEshl KMwEl/m5iLj3iHnHPVLBUWrXPdS7iQijJA/VLxjnFknhaS60hkUNWexDMxVVP/6lbOrs4bDZ NEWDMktAeqJaFtxackPszlcpRVkAs6Msn9tu8hlvB517pyUgvuD7ZS9gGOMmYwFQDyytpepo YApVV00P0u3AaE0Cj/o71STqGJKZxcVhPaZ+LR+UCBZOyKfEyq+ZN311VpOJZ1IvTExf+S/5 lqnciDtbO3I4Wq0ArLX1gs1q1XlXLaVaA3yVqeC8E7kOchDNinD3hJS4OX0e1gdsx/e6COvy qNg5aL5n0Kl4fcVqM0LdIhsubVs4eiNCa5XMSYpXmVi3HAuFyg9dN+x8thSwI836FoMASwOl C7tHsTjnSGufB+D7F7ZBT61BffNBBIm1KdMxcxqLUVXpBQHHlGkbwI+3Ye+nE6HmZH7IwLwV W+Ajl7oYF+jeKaH4DZFtgLYGLtZ1LDwKPjX7VAsa4Yx7S5+EBAaZGxK510MjIx6SGrZWBrrV TEvdV00F2MnQoeXKzD7O4WFbL55hhyGgfWTHwZ457iN9SgYi1JLPqWkZB0JRXIEtjd4JEQcx +8Umfre0Xt4713VxMygW0PnQt5aSQdMD58jHFxTk092mU+yIHj5LeYgvwSgZN4airXk5yRXl SE+xAvmumFBY
- Cc: Kate Stewart <kstewart@xxxxxxxxxxxxxxxxxxx>, Rich Felker <dalias@xxxxxxxx>, linux-ia64@xxxxxxxxxxxxxxx, linux-sh@xxxxxxxxxxxxxxx, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>, Balbir Singh <bsingharora@xxxxxxxxx>, Heiko Carstens <heiko.carstens@xxxxxxxxxx>, Pavel Tatashin <pavel.tatashin@xxxxxxxxxxxxx>, Michal Hocko <mhocko@xxxxxxxx>, Paul Mackerras <paulus@xxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, Rashmica Gupta <rashmica.g@xxxxxxxxx>, "K. Y. Srinivasan" <kys@xxxxxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, linux-s390@xxxxxxxxxxxxxxx, Michael Neuling <mikey@xxxxxxxxxxx>, Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>, Yoshinori Sato <ysato@xxxxxxxxxxxxxxxxxxxx>, Michael Ellerman <mpe@xxxxxxxxxxxxxx>, linux-acpi@xxxxxxxxxxxxxxx, Ingo Molnar <mingo@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Rob Herring <robh@xxxxxxxxxx>, Len Brown <lenb@xxxxxxxxxx>, Fenghua Yu <fenghua.yu@xxxxxxxxx>, Stephen Rothwell <sfr@xxxxxxxxxxxxxxxx>, "mike.travis@xxxxxxx" <mike.travis@xxxxxxx>, Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>, Dan Williams <dan.j.williams@xxxxxxxxx>, Jonathan Neuschäfer <j.neuschaefer@xxxxxxx>, Nicholas Piggin <npiggin@xxxxxxxxx>, Joe Perches <joe@xxxxxxxxxxx>, Jérôme Glisse <jglisse@xxxxxxxxxx>, Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Andy Lutomirski <luto@xxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>, Oscar Salvador <osalvador@xxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, Tony Luck <tony.luck@xxxxxxxxx>, Mathieu Malaterre <malat@xxxxxxxxxx>, Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>, "Rafael J. Wysocki" <rjw@xxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, Mauricio Faria de Oliveira <mauricfo@xxxxxxxxxxxxxxxxxx>, Philippe Ombredanne <pombredanne@xxxxxxxx>, Martin Schwidefsky <schwidefsky@xxxxxxxxxx>, devel@xxxxxxxxxxxxxxxxxxxxxx, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, linuxppc-dev@xxxxxxxxxxxxxxxx, "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
- Delivery-date: Mon, 01 Oct 2018 09:14:10 +0000
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
- Openpgp: preference=signencrypt
On 28/09/2018 19:02, Dave Hansen wrote:
> It's really nice if these kinds of things are broken up. First, replace
> the old want_memblock parameter, then add the parameter to the
> __add_page() calls.
Definitely, once we agree that is is not nuts, I will split it up for
the next version :)
>
>> +/*
>> + * NONE: No memory block is to be created (e.g. device memory).
>> + * NORMAL: Memory block that represents normal (boot or hotplugged) memory
>> + * (e.g. ACPI DIMMs) that should be onlined either automatically
>> + * (memhp_auto_online) or manually by user space to select a
>> + * specific zone.
>> + * Applicable to memhp_auto_online.
>> + * STANDBY: Memory block that represents standby memory that should only
>> + * be onlined on demand by user space (e.g. standby memory on
>> + * s390x), but never automatically by the kernel.
>> + * Not applicable to memhp_auto_online.
>> + * PARAVIRT: Memory block that represents memory added by
>> + * paravirtualized mechanisms (e.g. hyper-v, xen) that will
>> + * always automatically get onlined. Memory will be unplugged
>> + * using ballooning, not by relying on the MOVABLE ZONE.
>> + * Not applicable to memhp_auto_online.
>> + */
>> +enum {
>> + MEMORY_BLOCK_NONE,
>> + MEMORY_BLOCK_NORMAL,
>> + MEMORY_BLOCK_STANDBY,
>> + MEMORY_BLOCK_PARAVIRT,
>> +};
>
> This does not seem like the best way to expose these.
>
> STANDBY, for instance, seems to be essentially a replacement for a check
> against running on s390 in userspace to implement a _typical_ s390
> policy. It seems rather weird to try to make the userspace policy
> determination easier by telling userspace about the typical s390 policy
> via the kernel.
Now comes the fun part: I am working on another paravirtualized memory
hotplug way for KVM guests, based on virtio ("virtio-mem").
These devices can potentially be used concurrently with
- s390x standby memory
- DIMMs
How should a policy in user space look like when new memory gets added
- on s390x? Not onlining paravirtualized memory is very wrong.
- on e.g. x86? Onlining memory to the MOVABLE zone is very wrong.
So the type of memory is very important here to have in user space.
Relying on checks like "isS390()", "isKVMGuest()" or "isHyperVGuest()"
to decide whether to online memory and how to online memory is wrong.
Only some specific memory types (which I call "normal") are to be
handled by user space.
For the other ones, we exactly know what to do:
- standby? don't online
- paravirt? always online to normal zone
I will add some more details as reply to Michal.
>
> As for the OOM issues, that sounds like something we need to fix by
> refusing to do (or delaying) hot-add operations once we consume too much
> ZONE_NORMAL from memmap[]s rather than trying to indirectly tell
> userspace to hurry thing along.
That is a moving target and doing that automatically is basically
impossible. You can add a lot of memory to the movable zone and
everything is fine. Suddenly a lot of processes are started - boom.
MOVABLE should only every be used if you expect an unplug. And for
paravirtualized devices, a "typical" unplug does not exist.
>
> So, to my eye, we need:
>
> +enum {
> + MEMORY_BLOCK_NONE,
> + MEMORY_BLOCK_STANDBY, /* the default */
> + MEMORY_BLOCK_AUTO_ONLINE,
> +};
auto-online is strongly misleading, that's why I called it "normal", but
I am open for suggestions. The information about devices handles fully
in the kernel - "paravirt" is key for me.
>
> and we can probably collapse NONE into AUTO_ONLINE because userspace
> ends up doing the same thing for both: nothing.
For external reasons, yes, for internal reasons no (see hmm/device
memory). In user space, we will never end up with MEMORY_BLOCK_NONE,
because there is no memory block.
>
>> struct memory_block {
>> unsigned long start_section_nr;
>> unsigned long end_section_nr;
>> @@ -34,6 +58,7 @@ struct memory_block {
>> int (*phys_callback)(struct memory_block *);
>> struct device dev;
>> int nid; /* NID for this memory block */
>> + int type; /* type of this memory block */
>> };
>
> Shouldn't we just be creating and using an actual named enum type?
>
That makes sense.
Thanks!
--
Thanks,
David / dhildenb
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|