[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 123379: regressions - FAIL

  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Juergen Gross <jgross@xxxxxxxx>
  • Date: Wed, 13 Jun 2018 08:50:05 +0200
  • Autocrypt: addr=jgross@xxxxxxxx; prefer-encrypt=mutual; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNHkp1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmRlPsLAeQQTAQIAIwUCU4xw6wIbAwcL CQgHAwIBBhUIAgkKCwQWAgMBAh4BAheAAAoJELDendYovxMvi4UH/Ri+OXlObzqMANruTd4N zmVBAZgx1VW6jLc8JZjQuJPSsd/a+bNr3BZeLV6lu4Pf1Yl2Log129EX1KWYiFFvPbIiq5M5 kOXTO8Eas4CaScCvAZ9jCMQCgK3pFqYgirwTgfwnPtxFxO/F3ZcS8jovza5khkSKL9JGq8Nk czDTruQ/oy0WUHdUr9uwEfiD9yPFOGqp4S6cISuzBMvaAiC5YGdUGXuPZKXLpnGSjkZswUzY d9BVSitRL5ldsQCg6GhDoEAeIhUC4SQnT9SOWkoDOSFRXZ+7+WIBGLiWMd+yKDdRG5RyP/8f 3tgGiB6cyuYfPDRGsELGjUaTUq3H2xZgIPfOwE0EU4xwFgEIAMsx+gDjgzAY4H1hPVXgoLK8 B93sTQFN9oC6tsb46VpxyLPfJ3T1A6Z6MVkLoCejKTJ3K9MUsBZhxIJ0hIyvzwI6aYJsnOew cCiCN7FeKJ/oA1RSUemPGUcIJwQuZlTOiY0OcQ5PFkV5YxMUX1F/aTYXROXgTmSaw0aC1Jpo w7Ss1mg4SIP/tR88/d1+HwkJDVW1RSxC1PWzGizwRv8eauImGdpNnseneO2BNWRXTJumAWDD pYxpGSsGHXuZXTPZqOOZpsHtInFyi5KRHSFyk2Xigzvh3b9WqhbgHHHE4PUVw0I5sIQt8hJq 5nH5dPqz4ITtCL9zjiJsExHuHKN3NZsAEQEAAcLAXwQYAQIACQUCU4xwFgIbDAAKCRCw3p3W KL8TL0P4B/9YWver5uD/y/m0KScK2f3Z3mXJhME23vGBbMNlfwbr+meDMrJZ950CuWWnQ+d+ Ahe0w1X7e3wuLVODzjcReQ/v7b4JD3wwHxe+88tgB9byc0NXzlPJWBaWV01yB2/uefVKryAf AHYEd0gCRhx7eESgNBe3+YqWAQawunMlycsqKa09dBDL1PFRosF708ic9346GLHRc6Vj5SRA UTHnQqLetIOXZm3a2eQ1gpQK9MmruO86Vo93p39bS1mqnLLspVrL4rhoyhsOyh0Hd28QCzpJ wKeHTd0MAWAirmewHXWPco8p1Wg+V+5xfZzuQY0f4tQxvOpXpt4gQ1817GQ5/Ed/wsDtBBgB CAAgFiEEhRJncuj2BJSl0Jf3sN6d1ii/Ey8FAlrd8NACGwIAgQkQsN6d1ii/Ey92IAQZFggA HRYhBFMtsHpB9jjzHji4HoBcYbtP2GO+BQJa3fDQAAoJEIBcYbtP2GO+TYsA/30H/0V6cr/W V+J/FCayg6uNtm3MJLo4rE+o4sdpjjsGAQCooqffpgA+luTT13YZNV62hAnCLKXH9n3+ZAgJ RtAyDWk1B/0SMDVs1wxufMkKC3Q/1D3BYIvBlrTVKdBYXPxngcRoqV2J77lscEvkLNUGsu/z W2pf7+P3mWWlrPMJdlbax00vevyBeqtqNKjHstHatgMZ2W0CFC4hJ3YEetuRBURYPiGzuJXU pAd7a7BdsqWC4o+GTm5tnGrCyD+4gfDSpkOT53S/GNO07YkPkm/8J4OBoFfgSaCnQ1izwgJQ jIpcG2fPCI2/hxf2oqXPYbKr1v4Z1wthmoyUgGN0LPTIm+B5vdY82wI5qe9uN6UOGyTH2B3p hRQUWqCwu2sqkI3LLbTdrnyDZaixT2T0f4tyF5Lfs+Ha8xVMhIyzNb1byDI5FKCb
  • Cc: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 13 Jun 2018 06:50:16 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 13/06/18 08:11, Jan Beulich wrote:
>>>> On 12.06.18 at 17:58, <jgross@xxxxxxxx> wrote:
>> Trying to reproduce the problem in a limited test environment finally
>> worked: doing a loop of "xl save -c" produced the problem after 198
>> iterations.
>> I have asked a SUSE engineer doing kernel memory management if he
>> could think of something. His idea is that maybe some kthread could be
>> the reason for our problem, e.g. trying page migration or compaction
>> (at least on the test machine I've looked at compaction of mlocked
>> pages is allowed: /proc/sys/vm/compact_unevictable_allowed is 1).
> Iirc the primary goal of compaction is to make contiguous memory
> available for huge page allocations. PV not using huge pages, this is
> of no interest here. The secondary consideration of physically
> contiguous I/O buffer is an illusion only under PV, so perhaps not
> much more of an interest (albeit I can see drivers wanting to allocate
> physically contiguous buffers nevertheless now and then, but I'd
> expect this to be mostly limited to driver initialization and device hot
> add).
> So it is perhaps at least worth considering whether to turn off
> compaction/migration when running PV. But the problem would still
> need addressing then mid-term, as PVH Dom0 would have the same
> issue (and of course DomU, i.e. including HVM, can make hypercalls
> too, and hence would be affected as well, just perhaps not as
> visibly).

I think we should try to solve the problem by being aware of such
possibilities. Another potential source would be NUMA memory
migration (not now in pv, of course). And who knows what will come
in the next years.

>> In order to be really sure nothing in the kernel can temporarily
>> switch hypercall buffer pages read-only or invalid for the hypervisor
>> we'll have to modify the privcmd driver interface: it will have to
>> gain knowledge which pages are handed over to the hypervisor as buffers
>> in order to be able to lock them accordingly via get_user_pages().
> So are you / is he saying that mlock() doesn't protect against such
> playing with process memory?

Right. Due to proper locking in the kernel this is just a guarantee you
won't ever see a fault for such a page in user mode.

> Teaching the privcmd driver of all
> the indirections in hypercall request structures doesn't look very
> attractive (or maintainable). Or are you thinking of the caller
> providing sideband information describing the buffers involved,
> perhaps along the lines of how dm_op was designed?

I thought about that, yes. libxencall already has all the needed data
for that. Another possibility would be a dedicated ioctl for registering
a hypercall buffer (or some of them).

> There's another option, but that has potentially severe drawbacks
> too: Instead of returning -EFAULT on buffer access issues, we
> could raise #PF on the very hypercall insn. Maybe something to
> consider as an opt-in for PV/PVH, and as default for HVM.

Hmm, I'm not sure this will solve any problem. I'm not aware that it
is considered good practice to just access a user buffer from kernel
without using copyin()/copyout() when you haven't locked the page(s)
via get_user_pages(), even if the buffer was mlock()ed. Returning
-EFAULT is the right thing to do, I believe.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.