[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Xen panic due to xstate mismatch
 
 
I attached the output of the `xl dmesg`. This is the 4.19.1 kernel I rebuild but I have the same issue with master (just for info).
  And also as you said earlier it works with the default installation because I see that the first line is: `(XEN) [0000001476779e16] Xen version 4.19.1 (Debian 4.19.1-1+b2) ( pkg-xen-devel@xxxxxxxxxxxxxxxxxxxxxxx) (x86_64-linux-gnu-gcc (Debian 14.2.0-14) 14.2.0) debug=n Mon Jan 27 15:31:22 UTC 2025` Indeed it is compiled with debug=n while mine has debug set to yes. So that explains why the default one is booting. But what is strange is that to build the kernel I copy the default `/boot/xen-4.19-amd64.config` as `.config` where I built the kernel. So I probably miss something here. Oh wait I'm stupid I copy it into the top dir and not the xen/ dir. So in fact it generates a default one with debug enabled.  Well actually this error is interesting because it allows me to dive into the code :)
 
 
 Can you also get `xl dmesg` too, and attach it? 
 
I think this is a VirtualBox bug, but I'm confused as to why Xen has 
decided to turn off AVX. 
 
~Andrew 
 
On 02/02/2025 4:01 pm, Guillaume wrote: 
> Yes sure I can collect the output. As you said the change is good 
> enough to start the dom0 without errors (at least no apparent errors :). 
> ``` 
> Xen reports there are maximum 120 leaves and 2 MSRs 
> Raw policy: 32 leaves, 2 MSRs 
>  CPUID: 
>   leaf     subleaf  -> eax      ebx      ecx      edx 
>   00000000:ffffffff -> 00000016:756e6547:6c65746e:49656e69 
>   00000001:ffffffff -> 000806c1:00020800:f6fa3203:178bfbff 
>   00000002:ffffffff -> 00feff01:000000f0:00000000:00000000 
>   00000004:00000000 -> 04000121:02c0003f:0000003f:00000000 
>   00000004:00000001 -> 04000122:01c0003f:0000003f:00000000 
>   00000004:00000002 -> 04000143:04c0003f:000003ff:00000000 
>   00000004:00000003 -> 04000163:02c0003f:00003fff:00000004 
>   00000006:ffffffff -> 00000004:00000000:00000000:00000000 
>   00000007:00000000 -> 00000000:208c2569:00000000:30000400 
>   0000000b:00000000 -> 00000000:00000001:00000100:00000000 
>   0000000b:00000001 -> 00000001:00000002:00000201:00000000 
>   0000000d:00000000 -> 00000007:00000000:00000340:00000000 
>   0000000d:00000002 -> 00000100:00000240:00000000:00000000 
>   80000000:ffffffff -> 80000008:00000000:00000000:00000000 
>   80000001:ffffffff -> 00000000:00000000:00000121:28100800 
>   80000002:ffffffff -> 68743131:6e654720:746e4920:52286c65 
>   80000003:ffffffff -> 6f432029:54286572:6920294d:31312d37 
>   80000004:ffffffff -> 37473538:33204020:4730302e:00007a48 
>   80000006:ffffffff -> 00000000:00000000:01007040:00000000 
>   80000007:ffffffff -> 00000000:00000000:00000000:00000100 
>   80000008:ffffffff -> 00003027:00000000:00000000:00000000 
>  MSRs: 
>   index    -> value 
>   000000ce -> 0000000000000000 
>   0000010a -> 0000000000000000 
> Host policy: 30 leaves, 2 MSRs 
>  CPUID: 
>   leaf     subleaf  -> eax      ebx      ecx      edx 
>   00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69 
>   00000001:ffffffff -> 000806c1:00020800:c6fa2203:178bfbff 
>   00000002:ffffffff -> 00feff01:000000f0:00000000:00000000 
>   00000004:00000000 -> 04000121:02c0003f:0000003f:00000000 
>   00000004:00000001 -> 04000122:01c0003f:0000003f:00000000 
>   00000004:00000002 -> 04000143:04c0003f:000003ff:00000000 
>   00000004:00000003 -> 04000163:02c0003f:00003fff:00000004 
>   00000007:00000000 -> 00000000:208c2549:00000000:30000400 
>   0000000d:00000000 -> 00000003:00000000:00000240:00000000 
>   80000000:ffffffff -> 80000008:00000000:00000000:00000000 
>   80000001:ffffffff -> 00000000:00000000:00000121:28100800 
>   80000002:ffffffff -> 68743131:6e654720:746e4920:52286c65 
>   80000003:ffffffff -> 6f432029:54286572:6920294d:31312d37 
>   80000004:ffffffff -> 37473538:33204020:4730302e:00007a48 
>   80000006:ffffffff -> 00000000:00000000:01007040:00000000 
>   80000007:ffffffff -> 00000000:00000000:00000000:00000100 
>   80000008:ffffffff -> 00003027:00000000:00000000:00000000 
>  MSRs: 
>   index    -> value 
>   000000ce -> 0000000000000000 
>   0000010a -> 0000000000000000 
> PV Max policy: 57 leaves, 2 MSRs 
>  CPUID: 
>   leaf     subleaf  -> eax      ebx      ecx      edx 
>   00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69 
>   00000001:ffffffff -> 000806c1:00020800:c6f82203:1789cbf5 
>   00000002:ffffffff -> 00feff01:000000f0:00000000:00000000 
>   00000004:00000000 -> 04000121:02c0003f:0000003f:00000000 
>   00000004:00000001 -> 04000122:01c0003f:0000003f:00000000 
>   00000004:00000002 -> 04000143:04c0003f:000003ff:00000000 
>   00000004:00000003 -> 04000163:02c0003f:00003fff:00000004 
>   00000007:00000000 -> 00000002:208c0109:00000000:20000400 
>   0000000d:00000000 -> 00000003:00000000:00000240:00000000 
>   80000000:ffffffff -> 80000021:00000000:00000000:00000000 
>   80000001:ffffffff -> 00000000:00000000:00000123:28100800 
>   80000002:ffffffff -> 68743131:6e654720:746e4920:52286c65 
>   80000003:ffffffff -> 6f432029:54286572:6920294d:31312d37 
>   80000004:ffffffff -> 37473538:33204020:4730302e:00007a48 
>   80000006:ffffffff -> 00000000:00000000:01007040:00000000 
>   80000007:ffffffff -> 00000000:00000000:00000000:00000100 
>   80000008:ffffffff -> 00003027:00000000:00000000:00000000 
>  MSRs: 
>   index    -> value 
>   000000ce -> 0000000000000000 
>   0000010a -> 0000000010020004 
> HVM Max policy: 4 leaves, 2 MSRs 
>  CPUID: 
>   leaf     subleaf  -> eax      ebx      ecx      edx 
>  MSRs: 
>   index    -> value 
>   000000ce -> 0000000000000000 
>   0000010a -> 0000000000000000 
> PV Default policy: 30 leaves, 2 MSRs 
>  CPUID: 
>   leaf     subleaf  -> eax      ebx      ecx      edx 
>   00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69 
>   00000001:ffffffff -> 000806c1:00020800:c6d82203:1789cbf5 
>   00000002:ffffffff -> 00feff01:000000f0:00000000:00000000 
>   00000004:00000000 -> 04000121:02c0003f:0000003f:00000000 
>   00000004:00000001 -> 04000122:01c0003f:0000003f:00000000 
>   00000004:00000002 -> 04000143:04c0003f:000003ff:00000000 
>   00000004:00000003 -> 04000163:02c0003f:00003fff:00000004 
>   00000007:00000000 -> 00000000:208c0109:00000000:20000400 
>   0000000d:00000000 -> 00000003:00000000:00000240:00000000 
>   80000000:ffffffff -> 80000008:00000000:00000000:00000000 
>   80000001:ffffffff -> 00000000:00000000:00000121:28100800 
>   80000002:ffffffff -> 68743131:6e654720:746e4920:52286c65 
>   80000003:ffffffff -> 6f432029:54286572:6920294d:31312d37 
>   80000004:ffffffff -> 37473538:33204020:4730302e:00007a48 
>   80000006:ffffffff -> 00000000:00000000:01007040:00000000 
>   80000008:ffffffff -> 00003027:00000000:00000000:00000000 
>  MSRs: 
>   index    -> value 
>   000000ce -> 0000000000000000 
>   0000010a -> 0000000000000000 
> HVM Default policy: 4 leaves, 2 MSRs 
>  CPUID: 
>   leaf     subleaf  -> eax      ebx      ecx      edx 
>  MSRs: 
>   index    -> value 
>   000000ce -> 0000000000000000 
>   0000010a -> 0000000000000000 
> ``` 
> 
> Guillaume 
> 
> On Sun, Feb 2, 2025 at 4:32 PM Andrew Cooper 
> <andrew.cooper3@xxxxxxxxxx> wrote: 
> 
>     This is a sanity check that an algorithm in Xen matches hardware.  
>     It is only compiled into debug builds by default.  
> 
>     Given that you're running under virtualbox, i have a suspicion as 
>     to what's wrong. 
> 
>     Can you collect the full `xen-cpuid -p` output from within your 
>     environment?  I don't believe you're suggested code change is 
>     correct, but it will good enough to get these diagnostics. 
> 
>     ~Andrew 
> 
>     On Sun, 2 Feb 2025, 15:32 Guillaume, <thouveng@xxxxxxxxx> wrote: 
> 
>         Hello, 
> 
>          I'd like to report an issue I encountered when building Xen 
>         from source. To give you some context, During the Xen winter 
>         meetup in Grenoble few days ago, there was a discussion about 
>         strengthening collaboration between Xen and academia. One 
>         issue raised by a professor was that Xen is harder for 
>         students to install and experiment compared to KVM. In 
>         response it was mentionned that Debian packages are quite 
>         decent. This motivated me to try installing and playing with 
>         Xen myself. While I am familiar with Xen (I work on the XAPI 
>         toolstack at Vates) I'm not deeply familiar with its 
>         internals, so this seemed like a good learning opportunity and 
>         maybe some contents for some blog posts :). 
> 
>          I set up a Debian testing VM on Virtualbox and installed Xen 
>         from packages. Everything worked fine: Grub was updated, I 
>         rebooted, and I had a functional Xen setup with xl running in 
>         Dom0. 
>          Next I download the last version of Xen from xenbits.org 
>         <http://xenbits.org>,and built only the hypervisor (no tools, 
>         no stubdom) , using the same configuration as the Debian 
>         package (which is for Xen 4.19). After updating GRUB and 
>         rebooting, Xen failed to boot. Fortunately, I was able to 
>         capture the following error via `ttyS0`: 
>         ``` 
>         (XEN) [0000000d2c23739a] xstate: size: 0x340 and states: 0x7 
>         (XEN) [0000000d2c509c1d] 
>         (XEN) [0000000d2c641ffa] **************************************** 
>         (XEN) [0000000d2c948e3b] Panic on CPU 0: 
>         (XEN) [0000000d2cb349bb] XSTATE 0x0000000000000003, 
>         uncompressed hw size 0x340 != xen size 0x240 
>         (XEN) [0000000d2cfc5786] **************************************** 
>         (XEN) [0000000d2d308c24] 
>         ``` 
>         From my understanding, the hardware xstate size (`hw_size`) 
>         represents the maximum memory required for the `XSAVE/XRSTOR` 
>         save area, while `xen_size` is computed by summing the space 
>         required for the enabled features. In `xen/arch/x86/xstate.c`, 
>         if these sizes do not match, Xen panics. However, wouldn’t it 
>         be correct for `xen_size` to be **less than or equal to** 
>         `hw_size` instead of exactly matching? 
> 
>         I tested the following change: 
>         ``` 
>         --- a/xen/arch/x86/xstate.c 
>         +++ b/xen/arch/x86/xstate.c 
>         @@ -710,7 +710,7 @@ static void __init check_new_xstate(struct 
>         xcheck_state *s, uint64_t new) 
>               */ 
>              xen_size = xstate_uncompressed_size(s->states & 
>         X86_XCR0_STATES); 
> 
>         -    if ( xen_size != hw_size ) 
>         +    if ( xen_size > hw_size ) 
>                  panic("XSTATE 0x%016"PRIx64", uncompressed hw size 
>         %#x != xen size %#x\n", 
>                        s->states, hw_size, xen_size); 
>         ``` 
>         With this change, Xen boots correctly, but I may be missing 
>         some side effects... 
>         Additionally, I am confused as to why this issue does *not* 
>         occur with the default Debian Xen package. Even when I rebuild 
>         Xen *4.19.1* from source (the same version as the package), I 
>         still encounter the issue. 
>         So I have two questions: 
>         - Is my understanding correct that |xen_size <= hw_size| 
>         should be allowed? 
>         - Are there any potential side effects of this change? 
>         - Bonus: Have some of you any explanations about why does the 
>         issue not occur with the packaged version of Xen but does with 
>         a self-built version? 
> 
>         Hope I wasn't too long and thanks for taking the time to read 
>         this, 
>         Best regards, 
> 
>         Guillaume 
> 
 
  
Attachment:
xl_dmesg_Xen_4.19.1.txt 
Description: Text document 
 
    
     |