|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Windows domu DRBD backend problem
Hi Attila,
On 23/04/2025 19:28, Kotán Attila wrote:
> Hello Tu Dinh,
> Update info:
> I baught today an NVME for testing and unfortunately the problem is
> present when the DRBD backend is NVME too.
> I tested before this situation when the primary node is not an DELL
> server (i tested with desktop category computer).
> Seems definitely related only to DELL servers or maybe the
> multiprocessor environment. I use only DELL servers, no have info about
> another vendor.
>
>
> Thank you for your advise.
> I try to catch all info / output:
>
> DRBD configs
> - global_common.conf
> -----
> global {
> usage-count yes;
> udev-always-use-vnr; # treat implicit the same as explicit volumes
> }
>
> common {
> handlers {
> }
>
> startup {
> }
>
> options {
> }
>
> disk {
> on-io-error detach;
> resync-rate 160M;
> }
>
> net {
> }
> }
> -----
>
> - w2022_system.res
> -----
> resource w2022_system {
> protocol C;
>
> net {
> }
>
> syncer {
> }
>
> on xen18 {
> device /dev/drbd0;
> disk /dev/NVME01/w2022_system;
> address 172.16.16.8:7800;
> meta-disk internal;
> }
>
> on xen16 {
> device /dev/drbd0;
> disk /dev/VG02/w2022_system;
> address 172.16.16.6:7800;
> meta-disk internal;
> }
> }
> -----
>
> Domu config
> - w2022.cfg
> -----
> name = 'w2022'
> builder = 'hvm'
> memory = 16384
> #shadow_memory = 8
> vcpus=16
> uuid = 'cac0559e-06fd-42fc-a92f-fa2d8cadaff1'
> vif = [ 'bridge=xenbr0, mac=00:11:6c:1c:49:17' ]
> disk = [ 'drbd:w2022_system,xvda,w', ]
> boot='dc'
> vnc=1
> vncunused=0
> vnclisten = '0.0.0.0'
> vncdisplay=2
> stdvga=1
> on_poweroff = 'destroy'
> on_reboot = 'restart'
> on_crash = 'restart'
> usb=1
> usbdevice=['tablet']
> -----
>
> Nothing special in config.
>
> I tested with Domain-0 is Debian 10, 11, 12 and testing (maybe trixie).
> I try Domu is Windows 7 or Windows 2022.
> The test envionment is:
> - Node1 (xen18) DELL T630 with PERC H730 (1G, BBU) or 1TB NVME as primary
> - Node2 (xen16) DELL R730XD with PERC H730mini (1G, BBU) as secondary
>
> I have problem too with another environment with two DELL R640 server.
>
> The node1 kern.log with PERC (DELL Raid controller) virtual disk DRBD
> backend:
> -----
> Apr 23 12:46:22 xen18 kernel: [ 574.527385] drbd w2022_system: PingAck
> did not arrive in time.
> Apr 23 12:46:22 xen18 kernel: [ 574.527464] drbd w2022_system:
> peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure )
> pdsk( UpToDate -> DUnknown )
> Apr 23 12:46:22 xen18 kernel: [ 574.527600] block drbd0: new current
> UUID 19B04B0803CC3C87:CB60C112A5C9EA5D:EBEF8CB4948C160D:EBEE8CB4948C160D
> Apr 23 12:46:22 xen18 kernel: [ 574.527661] drbd w2022_system:
> ack_receiver terminated
> Apr 23 12:46:22 xen18 kernel: [ 574.527665] drbd w2022_system:
> Terminating drbd_a_w2022_sy
> Apr 23 12:46:22 xen18 kernel: [ 574.583747] drbd w2022_system:
> Connection closed
> Apr 23 12:46:22 xen18 kernel: [ 574.584035] drbd w2022_system:
> conn( NetworkFailure -> Unconnected )
> Apr 23 12:46:22 xen18 kernel: [ 574.584038] drbd w2022_system: receiver
> terminated
> Apr 23 12:46:22 xen18 kernel: [ 574.584041] drbd w2022_system:
> Restarting receiver thread
> Apr 23 12:46:22 xen18 kernel: [ 574.584043] drbd w2022_system: receiver
> (re)started
> Apr 23 12:46:22 xen18 kernel: [ 574.584052] drbd w2022_system:
> conn( Unconnected -> WFConnection )
> -----
>
> The node1 kern.log with NVME DRBD backend:
> -----
> Apr 23 10:51:18 xen18 kernel: [ 912.800847] drbd w2022_system: PingAck
> did not arrive in time.
> Apr 23 10:51:18 xen18 kernel: [ 912.800930] drbd w2022_system:
> peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure )
> pdsk( UpToDate -> DUnknown )
> Apr 23 10:51:18 xen18 kernel: [ 912.807793] block drbd0: new current
> UUID 2361269D52925DF1:AB440CB0842F155D:25CE81C2C028E09E:74D2255AB30D4115
> Apr 23 10:51:18 xen18 kernel: [ 912.811762] drbd w2022_system:
> ack_receiver terminated
> Apr 23 10:51:18 xen18 kernel: [ 912.811768] drbd w2022_system:
> Terminating drbd_a_w2022_sy
> Apr 23 10:51:18 xen18 kernel: [ 912.853400] drbd w2022_system:
> Connection closed
> Apr 23 10:51:18 xen18 kernel: [ 912.853723] drbd w2022_system:
> conn( NetworkFailure -> Unconnected )
> Apr 23 10:51:18 xen18 kernel: [ 912.853727] drbd w2022_system: receiver
> terminated
> Apr 23 10:51:18 xen18 kernel: [ 912.853729] drbd w2022_system:
> Restarting receiver thread
> Apr 23 10:51:18 xen18 kernel: [ 912.853732] drbd w2022_system: receiver
> (re)started
> Apr 23 10:51:18 xen18 kernel: [ 912.853740] drbd w2022_system:
> conn( Unconnected -> WFConnection )
> -----
>
> Seems the Domu can't write back to the DRBD because after i destroy the
> Domu (no other sollution to exit), i got the following error:
> libxl: error: libxl_exec.c:117:libxl_report_child_exitstatus: /etc/xen/
> scripts/block-drbd remove [1380] exited with error status 1
> libxl: error: libxl_device.c:1259:device_hotplug_child_death_cb:
> script: /etc/xen/scripts/block-drbd failed; error detected.
>
> The domu can't release the DRBD, and looks cannot release in the xenstore:
> root@xen18:~# xl list
> Name ID Mem VCPUs State Time(s)
> Domain-0 0 4096 4 r----- 144.7
> (null) 1 0 16 --p--d 387.5
>
> I try to man many test with different DRBD config, but no luck.
> Sometimes the windows survive the disconnection, but if reconnecting the
> secondary thats freeze like disconnect.
>
> I didn't have problem if the:
> - Domu OS is Linux with same config.
> - XEN PV (VBD) driver no installed to Domu.
>
> The latest (unsigned) or any other windpws driver have any debug options?
>
> Thank you for your help.
> Best Regards:
>
> Attila
>
>
Default builds of Windows PV drivers will send their log data over
xen_platform_log, you can get them from your QEMU log.
Best regards,
Ngoc Tu Dinh | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |