[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC v2] Add SUPPORT.md



On 25/10/17 11:59, George Dunlap wrote:
>>>>>>> +    Limit, x86 HVM: 128
>>>>>>> +    Limit, ARM32: 8
>>>>>>> +    Limit, ARM64: 128
>>>>>>> +
>>>>>>> +[XXX Andrew Cooper: Do want to add "Limit-Security" here for some of 
>>>>>>> these?]
>>>>>> 32 for each.  64 vcpu HVM guests can excerpt enough p2m lock pressure to
>>>>>> trigger a 5 second host watchdog timeout.
>>>>> Is that "32 for x86 PV and x86 HVM", or "32 for x86 HVM and ARM64"?  Or
>>>>> something else?
>>>> The former.  I'm not qualified to comment on any of the ARM limits.
>>>>
>>>> There are several non-trivial for_each_vcpu() loops in the domain_kill
>>>> path which aren't handled by continuations.  ISTR 128 vcpus is enough to
>>>> trip a watchdog timeout when freeing pagetables.
>>> I don't think 32 is a really practical limit.
>> What do you mean by practical here, and what evidence are you basing
>> this on?
>>
>> Amongst other things, there is an ABI boundary in Xen at 32 vcpus, and
>> given how often it is broken in Linux, its clear that there isn't
>> regular testing happening beyond this limit.
> Is that true for dom0 as well?

Yes.  The problem is:

struct shared_info {
    struct vcpu_info vcpu_info[XEN_LEGACY_MAX_VCPUS];
...

and while there are ways to make a larger number of vcpus work, it
requires additional hypercalls to make alternate arrangements for the
vcpus beyond the 32 boundary, and these arrangements appear to be broken
more often than not around suspend/resume.

>
>>> I'm inclined to say that if a rogue guest can crash a host with 33 vcpus, 
>>> we should issue an XSA
>>> and fix it.
>> The reason XenServer limits at 32 vcpus is that I can crash Xen with a
>> 64 vcpu HVM domain.  The reason it hasn't been my top priority to fix
>> this is because there is very little customer interest in pushing this
>> limit higher.
>>
>> Obviously, we should fix issues as and when they are discovered, and
>> work towards increasing the limits in the longterm, but saying "this
>> limit seems too low, so lets provisionally set it higher" is short
>> sighted and a recipe for more XSAs.
> OK -- I'll set this to 32 for now and see if anyone else wants to
> argue for a different value.

Sounds good to me.

>
>>>>>>> +
>>>>>>> +### x86 PV/Event Channels
>>>>>>> +
>>>>>>> +    Limit: 131072
>>>>>> Why do we call out event channel limits but not grant table limits?
>>>>>> Also, why is this x86?  The 2l and fifo ABIs are arch agnostic, as far
>>>>>> as I am aware.
>>>>> Sure, but I'm pretty sure that ARM guests don't (perhaps cannot?) use PV
>>>>> event channels.
>>>> This is mixing the hypervisor API/ABI capabilities with the actual
>>>> abilities of guests (which is also different to what Linux would use in
>>>> the guests).
>>> I'd say rather that you are mixing up the technical abilities of a
>>> system with user-facing features.  :-)  At the moment there is no reason
>>> for any ARM user to even think about event channels, so there's no
>>> reason to bother them with the technical details.  If at some point that
>>> changes, we can modify the document.
>> You do realise that receiving an event is entirely asymmetric with
>> sending an event?
>>
>> Even on ARM, {net,blk}front needs to speak event_{2l,fifo} with Xen to
>> bind and use its interdomain event channel(s) with {net,blk}back.
> I guess I didn't realize that (and just noticed Stefano's comment
> saying ARM uses event channels).
>
>>>> ARM guests, as well as x86 HVM with APICV (configured properly) will
>>>> actively want to avoid the guest event channel interface, because its
>>>> slower.
>>>>
>>>> This solitary evtchn limit serves no useful purpose IMO.
>>> There may be a point to what you're saying: The event channel limit
>>> normally manifests itself as a limit on the number of guests / total
>>> devices.
>>>
>>> On the other hand, having these kinds of limits around does make sense.
>>>
>>> Let me give it some thoughts.  (If anyone else has any opinions...)
>> The event_fifo limit is per-domain, not system-wide.
>>
>> In general this only matters for a monolithic dom0, as it is one end of
>> each event channel in the system.
> Sure -- and that's why the limit used to matter.  It doesn't seem to
> matter at the moment because you now hit other resource bottlenecks
> before you hit the event channel limit.

This point highlights why conjoining the information is misleading.

A dom0 which (for whatever reason) chooses to use event_2l will still
hit the event channel bottlekneck before other resource bottleknecks.

I'd expect the information to look a little more like this (formatting
subject to improvement)

## Event channels

### Event Channel 2-level ABI
Limit-theoretical (per guest): 1024 (32bit guest), 4096 (64bit guest)
Supported

### Event Channel FIFO ABI
Limit-theoretical (per guest): 131072
Supported

(We may want a shorthand for "this is the theoretical limit, and we
support it all the way up to the limit").

>
>>>>>  * Guest serial console
>>>> Which consoles?  A qemu emulated-serial will be qemus problem to deal
>>>> with.  Anything xenconsoled based will be the guests problem to deal
>>>> with, so pass.
>>> If the guest sets up extra consoles, these will show up in some
>>> appropriately-discoverable place after the migrate?
>> That is a complete can of worms.  Where do you draw the line?  log files
>> will get spliced across the migrate point, and `xl console $DOM` will
>> terminate, but whether this is "reasonably expected" is very subjective.
> Log files getting spliced and `xl console` terminating is I think
> reasonable to expect.  I was more talking about the "channel" feature
> (see xl.cfg man page on 'channels') -- will the device file show up on
> the remote dom0 after migration?

A cursory `git grep` doesn't show anything promising.

>
> But I suppose that feature doesn't really belong under "debugging,
> analysis, and crash post-mortem".
>
>>>>>  * Intel Platform QoS
>>>> Not exposed to guests at all, so it has no migration interaction atm.
>>> Well suppose a user limited a guest to using only 1k of L3 cache, and
>>> then saved and restored it.  Would she be surprised that the QoS limit
>>> disappeared?
>>>
>>> I think so, so we should probably call it out.
>> Oh - you mean the xl configuration.
>>
>> A quick `git grep` says that libxl_psr.c isn't referenced by any other
>> code in libxl, which means that the settings almost certainly get lost
>> on migrate.
> Can't you modify restrictions after the VM is started?  But either
> way, they won't be there after migrate, which may be surprising.

It appears that the libxl side of this basically stateless, and just
shuffles settings between the xl cmdline and Xen.

>
>>>>>  * Remus
>>>>>  * COLO
>>>> These are both migration protocols themselves, so don't really fit into
>>>> this category.  Anything with works in normal migration should work when
>>>> using these.
>>> The question is, "If I have a VM which is using Remus, can I call `xl
>>> migrate/(save+restore)` on it?"
>> There is no such thing as "A VM using Remus/COLO" which isn't migrating.
>>
>> Calling `xl migrate` a second time is user error, and they get to keep
>> all the pieces.
>>
>>> I.e., suppose I have a VM on host A (local) being replicated to host X
>>> (remote) via REMUS.  Can I migrate that VM to host B (also local), while
>>> maintaining the replication to host X?
>>>
>>> Sounds like the answer is "no", so these are not compatible.
>> I think your expectations are off here.
>>
>> To move a VM which is using remus/colo, you let it fail-over to the
>> destination then start replicating it again to a 3rd location.
>>
>> Attempting to do what you describe is equivalent to `xl migrate $DOM $X
>> & xl migrate $DOM $Y` and expecting any pieces to remain intact.
>>
>> (As a complete guess) what will most likely happen is that one stream
>> will get memory corruption, and the other stream will take a hard error
>> on the source side, because both of them are trying to be the
>> controlling entity for logdirty mode.  One stream has logdirty turned
>> off behind its back, and the other gets a hard error for trying to
>> enable logdirty mode a second time.
> You're confusing mechanism with interface again.  Migration is the
> internal mechanism Remus and COLO use, but a user doesn't type "xl
> migrate" for any of them, so how are they supposed to know that it's
> the same mechanism being used?  And in any case, being able to migrate
> a replicated VM from one "local" host to another (as I've described)
> seems like a pretty cool feature to me.  If I had time and inclination
> to make COLO or Remus awesome I'd try to implement it.  From a user's
> perspective, I don't think it's at all a given that it doesn't work;
> so we need to tell them.

I don't think its reasonable to expect people to be able to use
Remus/COLO without knowing that it is migration.

OTOH, you are correct that calling `xl migrate` on top of an
already-running Remus/COLO session (or indeed, on top of a plain
migrate) will cause everything to blow up, and there are no interlocks
to prevent such an explosion from happening.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.