[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Windows domu DRBD backend problem



Hi Attila,

On 23/04/2025 19:28, Kotán Attila wrote:
> Hello Tu Dinh,
> Update info:
> I baught today an NVME for testing and unfortunately the problem is
> present when the DRBD backend is NVME too.
> I tested before this situation when the primary node is not an DELL
> server (i tested with desktop category computer).
> Seems definitely related only to DELL servers or maybe the
> multiprocessor environment. I use only DELL servers, no have info about
> another vendor.
>
>
> Thank you for your advise.
> I try to catch all info / output:
>
> DRBD configs
> - global_common.conf
> -----
> global {
>          usage-count yes;
>          udev-always-use-vnr; # treat implicit the same as explicit volumes
> }
>
> common {
>          handlers {
>          }
>
>          startup {
>          }
>
>          options {
>          }
>
>          disk {
>                  on-io-error     detach;
>                  resync-rate         160M;
>          }
>
>          net {
>          }
> }
> -----
>
> - w2022_system.res
> -----
> resource w2022_system {
>    protocol C;
>
>    net {
>    }
>
>    syncer {
>     }
>
>    on xen18 {
>      device    /dev/drbd0;
>      disk      /dev/NVME01/w2022_system;
>      address   172.16.16.8:7800;
>      meta-disk internal;
>    }
>
>    on xen16 {
>      device     /dev/drbd0;
>      disk       /dev/VG02/w2022_system;
>      address    172.16.16.6:7800;
>      meta-disk  internal;
>    }
> }
> -----
>
> Domu config
> - w2022.cfg
> -----
> name = 'w2022'
> builder = 'hvm'
> memory = 16384
> #shadow_memory = 8
> vcpus=16
> uuid = 'cac0559e-06fd-42fc-a92f-fa2d8cadaff1'
> vif = [ 'bridge=xenbr0, mac=00:11:6c:1c:49:17' ]
> disk = [ 'drbd:w2022_system,xvda,w', ]
> boot='dc'
> vnc=1
> vncunused=0
> vnclisten = '0.0.0.0'
> vncdisplay=2
> stdvga=1
> on_poweroff = 'destroy'
> on_reboot = 'restart'
> on_crash = 'restart'
> usb=1
> usbdevice=['tablet']
> -----
>
> Nothing special in config.
>
> I tested with Domain-0 is Debian 10, 11, 12 and testing (maybe trixie).
> I try Domu is Windows 7 or Windows 2022.
> The test envionment is:
> - Node1 (xen18) DELL T630 with PERC H730 (1G, BBU) or 1TB NVME as primary
> - Node2  (xen16) DELL R730XD with PERC H730mini (1G, BBU) as secondary
>
> I have problem too with another environment with two DELL R640 server.
>
> The node1 kern.log with PERC (DELL Raid controller) virtual disk DRBD
> backend:
> -----
> Apr 23 12:46:22 xen18 kernel: [  574.527385] drbd w2022_system: PingAck
> did not arrive in time.
> Apr 23 12:46:22 xen18 kernel: [  574.527464] drbd w2022_system:
> peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure )
> pdsk( UpToDate -> DUnknown )
> Apr 23 12:46:22 xen18 kernel: [  574.527600] block drbd0: new current
> UUID 19B04B0803CC3C87:CB60C112A5C9EA5D:EBEF8CB4948C160D:EBEE8CB4948C160D
> Apr 23 12:46:22 xen18 kernel: [  574.527661] drbd w2022_system:
> ack_receiver terminated
> Apr 23 12:46:22 xen18 kernel: [  574.527665] drbd w2022_system:
> Terminating drbd_a_w2022_sy
> Apr 23 12:46:22 xen18 kernel: [  574.583747] drbd w2022_system:
> Connection closed
> Apr 23 12:46:22 xen18 kernel: [  574.584035] drbd w2022_system:
> conn( NetworkFailure -> Unconnected )
> Apr 23 12:46:22 xen18 kernel: [  574.584038] drbd w2022_system: receiver
> terminated
> Apr 23 12:46:22 xen18 kernel: [  574.584041] drbd w2022_system:
> Restarting receiver thread
> Apr 23 12:46:22 xen18 kernel: [  574.584043] drbd w2022_system: receiver
> (re)started
> Apr 23 12:46:22 xen18 kernel: [  574.584052] drbd w2022_system:
> conn( Unconnected -> WFConnection )
> -----
>
> The node1 kern.log with NVME DRBD backend:
> -----
> Apr 23 10:51:18 xen18 kernel: [  912.800847] drbd w2022_system: PingAck
> did not arrive in time.
> Apr 23 10:51:18 xen18 kernel: [  912.800930] drbd w2022_system:
> peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure )
> pdsk( UpToDate -> DUnknown )
> Apr 23 10:51:18 xen18 kernel: [  912.807793] block drbd0: new current
> UUID 2361269D52925DF1:AB440CB0842F155D:25CE81C2C028E09E:74D2255AB30D4115
> Apr 23 10:51:18 xen18 kernel: [  912.811762] drbd w2022_system:
> ack_receiver terminated
> Apr 23 10:51:18 xen18 kernel: [  912.811768] drbd w2022_system:
> Terminating drbd_a_w2022_sy
> Apr 23 10:51:18 xen18 kernel: [  912.853400] drbd w2022_system:
> Connection closed
> Apr 23 10:51:18 xen18 kernel: [  912.853723] drbd w2022_system:
> conn( NetworkFailure -> Unconnected )
> Apr 23 10:51:18 xen18 kernel: [  912.853727] drbd w2022_system: receiver
> terminated
> Apr 23 10:51:18 xen18 kernel: [  912.853729] drbd w2022_system:
> Restarting receiver thread
> Apr 23 10:51:18 xen18 kernel: [  912.853732] drbd w2022_system: receiver
> (re)started
> Apr 23 10:51:18 xen18 kernel: [  912.853740] drbd w2022_system:
> conn( Unconnected -> WFConnection )
> -----
>
> Seems the Domu can't write back to the DRBD because after i destroy the
> Domu (no other sollution to exit), i got the following error:
> libxl: error: libxl_exec.c:117:libxl_report_child_exitstatus: /etc/xen/
> scripts/block-drbd remove [1380] exited with error status 1
> libxl: error: libxl_device.c:1259:device_hotplug_child_death_cb:
> script: /etc/xen/scripts/block-drbd failed; error detected.
>
> The domu can't release the DRBD, and looks cannot release in the xenstore:
> root@xen18:~# xl list
> Name                                        ID   Mem VCPUs State    Time(s)
> Domain-0                                     0  4096     4 r-----     144.7
> (null)                                       1     0    16 --p--d     387.5
>
> I try to man many test with different DRBD config, but no luck.
> Sometimes the windows survive the disconnection, but if reconnecting the
> secondary thats freeze like disconnect.
>
> I didn't have problem if the:
> - Domu OS is Linux with same config.
> - XEN PV (VBD) driver no installed to Domu.
>
> The latest (unsigned) or any other windpws driver have any debug options?
>
> Thank you for your help.
> Best Regards:
>
> Attila
>
>
Default builds of Windows PV drivers will send their log data over
xen_platform_log, you can get them from your QEMU log.

Best regards,


Ngoc Tu Dinh | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.