[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 3/3] Add limited support of VMware's hyper-call



>>> On 03.09.14 at 02:55, <dslutz@xxxxxxxxxxx> wrote:
> On 09/02/14 04:16, Jan Beulich wrote:
>> As I think was said on v1 already - this should be split into smaller
>> pieces if at all possible. I'm therefore not going to do a full review
>> at this time, just give a couple of remarks.
> 
> The last time (v1's) split was much worse (and so I think that "split 
> into smaller"
> was not said). I could check be most were "combine this patch with that 
> patch";
> and more comment message.
> 
> Having been over this code many times in the last month, a possible
> quick split (which would not delay v3 too long) would be:
> 
> 1) Add simple vmport commands like BDOOR_CMD_GETVERSION
> 2) Add the hypervisor side of VMware guest info.
> 3) Add the hvm params that come from VMware guest info.
> 4) Add the hyper calls for libxc to handle VMware guest info.

One thing I'd certainly like to see split out is the save/restore
code. And the second - general - consideration would be to
split hypervisor from tools side changes as much as possible.

> And should I add the unit tests also (as optional)?

No idea - that's a tools side thing iirc, and hence a question to the
tools maintainers.

>>> @@ -579,6 +580,39 @@ long arch_do_domctl(
>>>           }
>>>           break;
>>>   
>>> +        case XEN_DOMCTL_SENDTRIGGER_VTPOWER:
>>> +        {
>>> +            ret = -EINVAL;
>>> +            if ( is_hvm_domain(d) )
>>> +            {
>>> +                ret = 0;
>>> +                vmport_ctrl_send(&d->arch.hvm_domain, "OS_Halt");
>>> +            }
>>> +        }
>>> +        break;
>>> +
>>> +        case XEN_DOMCTL_SENDTRIGGER_VTREBOOT:
>>> +        {
>>> +            ret = -EINVAL;
>>> +            if ( is_hvm_domain(d) )
>>> +            {
>>> +                ret = 0;
>>> +                vmport_ctrl_send(&d->arch.hvm_domain, "OS_Reboot");
>>> +            }
>>> +        }
>>> +        break;
>>> +
>>> +        case XEN_DOMCTL_SENDTRIGGER_VTPING:
>>> +        {
>>> +            ret = -EINVAL;
>>> +            if ( is_hvm_domain(d) )
>>> +            {
>>> +                ret = 0;
>>> +                vmport_ctrl_send(&d->arch.hvm_domain, "ping");
>>> +            }
>>> +        }
>>> +        break;
>> Rather than adding three new domctls, wouldn't a single VMware
>> one with suitable sub-operations do?
> 
> As far as I can tell these are already a "sub-operation":

And right you are. Never mind the comment then.

>> And in any event I'm rather uncomfortable about this getting
>> enabled unconditionally, also due to (but not limited to this) the
>> up front allocation of various memory blocks that may end up
>> never being needed. The main issue I see with this approach is
>> that guests could end up being misguided into thinking they're
>> running on VMware when in fact they have code in place to
>> optimize when running on Xen. We had such detection ordering
>> issues with Linux checking for both Hyper-V and Xen.
> 
> 
> Ok, I do feel strongly that this would need to be independent on
> vmware_hw "version". And so I will add a new xl.cfg item to control this.
> 
> QEMU on 5/19/2014 added a way to turn it's version off:
> 
> +@item vmport=on|off
> +Enables emulation of VMWare IO port, for vmmouse etc. (enabled by default)
> 
> So should I name it vmware_vmport or just vmport ( I lean to just vmport )?
> and unlike QEMU I am fine with the default of off.

I'd prefer the former as being more explicit (vmport alone doesn't
really provide a connection to VMware), but this again is something
to discuss with the tools maintainers.

>>> @@ -1532,6 +1561,17 @@ int hvm_domain_initialise(struct domain *d)
>>>       stdvga_deinit(d);
>>>       vioapic_deinit(d);
>>>    fail1:
>>> +    if ( d->arch.hvm_domain.vmport_data )
>>> +    {
>>> +        struct vmport_state *vs = d->arch.hvm_domain.vmport_data;
>>> +        int idx;
>>> +
>>> +        for (idx = 0; idx < vs->used_guestinfo; idx++)
>> You'll have to go through and fix coding style issues.
> 
> Do to the many coding styles I have used in the last 20 years, I can have
> style "blindness". I do not find a style issue here based on CODING_STYLE.
> All hints are welcome.

Just look at the difference between you if() and your for() - the
latter is lacking blanks inside the parentheses. And that difference
appeared to be a pretty consistent thing throughout the code I
looked a little more closely at.

>>> @@ -106,6 +107,7 @@ MAKE_INSTR(VMSAVE, 3, 0x0f, 0x01, 0xdb);
>>>   MAKE_INSTR(STGI,   3, 0x0f, 0x01, 0xdc);
>>>   MAKE_INSTR(CLGI,   3, 0x0f, 0x01, 0xdd);
>>>   MAKE_INSTR(INVLPGA,3, 0x0f, 0x01, 0xdf);
>>> +MAKE_INSTR(IN,     1, 0xed);
>> This name is ambiguous.
> 
> There are 4 opcodes and 6 instructions. Not at all sure how to name
> the one I need. Here is the info about the IN instruction.

I'd name it INL_DX, but one of the secondary problems you
may need to solve is that unlike for other opcodes you do not
want the operand size override ignored here.

>>> +static void svm_vmexit_gp_intercept(struct cpu_user_regs *regs, struct 
>>> vcpu 
> *v)
>>> +{
>>> +    struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
>>> +    unsigned long inst_len = __get_instruction_length(v, INSTR_IN);
>>> +
>>> +    regs->error_code = vmcb->exitinfo1;
>>> +    if ( hvm_long_mode_enabled(v) )
>>> +        HVMTRACE_LONG_C4D(TRAP_GP, TRAP_gp_fault, inst_len,
>>> +                          regs->error_code,
>>> +                          TRC_PAR_LONG(vmcb->exitinfo2));
>>> +    else
>>> +        HVMTRACE_C4D(TRAP_GP, TRAP_gp_fault, inst_len,
>>> +                     regs->error_code, vmcb->exitinfo2);
>>> +
>>> +    if ( inst_len <= 1 && (regs->rdx & 0xffff) == VMPORT_PORT &&
>> What would inst_len being zero mean here?
> 
> It use to mean that AMD was missing the feature:
> 
> 
> (XEN) [2014-08-14 20:17:28] - Next-RIP Saved on #VMEXIT
> 
> and so I needed to look at the opcode.

__get_instruction_length_from_list() returns using the value resulting
from that feature only when the length is non-zero.

> Now that I am using __get_instruction_length() I think it could be changed
> to a 1. However I can only test on an AMD server that does have this
> feature. Will switch to == 1 and hope for the best.
> 
> As far as I can tell __get_instruction_length() does not insure that the 
> opcode
> list passed is found, just that it might be.

Right, and zero is an indication that it wasn't found. Also I just
noticed there's a gdprintk() in that event, which for all other
cases the function gets used for is quite fine. #GP, however,
can result from various instructions, and hence you don't want
a message printed each time it wasn't due to "inl %dx, %eax".

>>> +         vmcb->exitinfo2 == 0 && regs->error_code == 0 )
>>> +    {
>>> +        uint32_t magic = regs->rax;
>>> +
>>> +        if ( magic == VMPORT_MAGIC )
>> This ought to be folded with the previous if(), at once reducing
>> code redundancy further down in the function.
> 
> OK. But that does make the debug log message more complex to code.

Maybe, albeit (as hinted at before) I question the use of these
anyway.

>>> +        {
>>> +            unsigned char bytes[1] = { 0 };
>> Pointless initializer (and it being an array looks suspicious too).
> 
> Will drop the initializer. hvm_fetch_from_guest_virt_nofault() does take 
> an array.
> So it is better to say "&tmp, regs->rip, 1" then "bytes, regs->rip, 1" ?

Yes, I think so.

> Also by dropping the initializer, you are stating that
> hvm_fetch_from_guest_virt_nofault() does initialize it's output on error (
> because the debug log output does include bytes[0]).

Existing debug log output, or one you add? It's a mistake in any case,
I'm just trying to understand whether a fix might be one order
independent of your patch.

>>   Speaking of which - why can't you just define
>> a new DBG_LEVEL_VMPORT and use the existing HVM_DBG_LOG()?
> 
> I did define 23 bits to control various part of the debug. A lot of this
> is because there are many times you go through the code. Basicly
> a string is 1st passed a length, and the 4 bytes at a time until the
> length is handled, followed by state changes...

Which again raises the question of the value of all this debugging
output. Whether or not a niche feature, I don't think it should be
significantly more verbose than the base code we have.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.