[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Event delivery and "domain blocking" on PVHv2



On 2020-06-19 01:43, Andrew Cooper wrote:
On 18/06/2020 11:13, Martin Lucina wrote:
On Monday, 15.06.2020 at 17:58, Andrew Cooper wrote:
On 15/06/2020 15:25, Martin Lucina wrote:
Hi,

puzzle time: In my continuing explorations of the PVHv2 ABIs for the
new MirageOS Xen stack, I've run into some issues with what looks like
missed deliveries of events on event channels.

While a simple unikernel that only uses the Xen console and
effectively does for (1..5) { printf("foo"); sleep(1); } works fine,
once I plug in the existing OCaml Xenstore and Netfront code, the
behaviour I see is that the unikernel hangs in random places, blocking
as if an event that should have been delivered has been missed.
You can see what is going on, event channel wise, with the 'e'
debug-key.  This will highlight cases such as the event channel being
masked and pending, which is a common guest bug ending up in this state.
Ok, based on your and Roger's suggestions I've made some changes:

1. I've dropped all the legacy PIC initialisation code from the Solo5
parts, written some basic APIC initialisation code and switched to using HVMOP_set_evtchn_upcall_vector for upcall registration, along with setting
HVM_PARAM_CALLBACK_IRQ to 1 as suggested by Roger and done by Xen when
running as a guest. Commit at [1], nothing controversial there.

Well...

    uint64_t apic_base = rdmsrq(MSR_IA32_APIC_BASE);
    wrmsrq(MSR_IA32_APIC_BASE,
            apic_base | (APIC_BASE << 4) | MSR_IA32_APIC_BASE_ENABLE);
    apic_base = rdmsrq(MSR_IA32_APIC_BASE);
    if (!(apic_base & MSR_IA32_APIC_BASE_ENABLE)) {
        log(ERROR, "Solo5: Could not enable APIC or not present\n");
        assert(false);
    }

The only reason Xen doesn't crash your guest on that WRMSR is because
0xfee00080ull | (0xfee00000u << 4) == 0xfee00080ull, due to truncation
and 0xfe | 0xee == 0xfe.

Either way, the logic isn't correct.

Oh, thanks. Don't you wish C had a "strict" mode where you could disable/warn
on implicit type promotion? I certainly do.


Xen doesn't support moving the APIC MMIO window (and almost certainly
never will, because the only thing which changes it is malware).  You
can rely on the default state being correct, because it is
architecturally specified.

Noted. I'll change the code to just verify that APIC_BASE is indeed FEE00000 at start of day and that the enable operation succeeded -- I like to keep the code robust, e.g. against cut-n-pasting to somewhere else that might be used
in a non-Xen context later where the precondition may not hold.

Martin


~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.