[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Windows domu DRBD backend problem
Hi Attila, On 23/04/2025 19:28, Kotán Attila wrote: > Hello Tu Dinh, > Update info: > I baught today an NVME for testing and unfortunately the problem is > present when the DRBD backend is NVME too. > I tested before this situation when the primary node is not an DELL > server (i tested with desktop category computer). > Seems definitely related only to DELL servers or maybe the > multiprocessor environment. I use only DELL servers, no have info about > another vendor. > > > Thank you for your advise. > I try to catch all info / output: > > DRBD configs > - global_common.conf > ----- > global { > usage-count yes; > udev-always-use-vnr; # treat implicit the same as explicit volumes > } > > common { > handlers { > } > > startup { > } > > options { > } > > disk { > on-io-error detach; > resync-rate 160M; > } > > net { > } > } > ----- > > - w2022_system.res > ----- > resource w2022_system { > protocol C; > > net { > } > > syncer { > } > > on xen18 { > device /dev/drbd0; > disk /dev/NVME01/w2022_system; > address 172.16.16.8:7800; > meta-disk internal; > } > > on xen16 { > device /dev/drbd0; > disk /dev/VG02/w2022_system; > address 172.16.16.6:7800; > meta-disk internal; > } > } > ----- > > Domu config > - w2022.cfg > ----- > name = 'w2022' > builder = 'hvm' > memory = 16384 > #shadow_memory = 8 > vcpus=16 > uuid = 'cac0559e-06fd-42fc-a92f-fa2d8cadaff1' > vif = [ 'bridge=xenbr0, mac=00:11:6c:1c:49:17' ] > disk = [ 'drbd:w2022_system,xvda,w', ] > boot='dc' > vnc=1 > vncunused=0 > vnclisten = '0.0.0.0' > vncdisplay=2 > stdvga=1 > on_poweroff = 'destroy' > on_reboot = 'restart' > on_crash = 'restart' > usb=1 > usbdevice=['tablet'] > ----- > > Nothing special in config. > > I tested with Domain-0 is Debian 10, 11, 12 and testing (maybe trixie). > I try Domu is Windows 7 or Windows 2022. > The test envionment is: > - Node1 (xen18) DELL T630 with PERC H730 (1G, BBU) or 1TB NVME as primary > - Node2 (xen16) DELL R730XD with PERC H730mini (1G, BBU) as secondary > > I have problem too with another environment with two DELL R640 server. > > The node1 kern.log with PERC (DELL Raid controller) virtual disk DRBD > backend: > ----- > Apr 23 12:46:22 xen18 kernel: [ 574.527385] drbd w2022_system: PingAck > did not arrive in time. > Apr 23 12:46:22 xen18 kernel: [ 574.527464] drbd w2022_system: > peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) > pdsk( UpToDate -> DUnknown ) > Apr 23 12:46:22 xen18 kernel: [ 574.527600] block drbd0: new current > UUID 19B04B0803CC3C87:CB60C112A5C9EA5D:EBEF8CB4948C160D:EBEE8CB4948C160D > Apr 23 12:46:22 xen18 kernel: [ 574.527661] drbd w2022_system: > ack_receiver terminated > Apr 23 12:46:22 xen18 kernel: [ 574.527665] drbd w2022_system: > Terminating drbd_a_w2022_sy > Apr 23 12:46:22 xen18 kernel: [ 574.583747] drbd w2022_system: > Connection closed > Apr 23 12:46:22 xen18 kernel: [ 574.584035] drbd w2022_system: > conn( NetworkFailure -> Unconnected ) > Apr 23 12:46:22 xen18 kernel: [ 574.584038] drbd w2022_system: receiver > terminated > Apr 23 12:46:22 xen18 kernel: [ 574.584041] drbd w2022_system: > Restarting receiver thread > Apr 23 12:46:22 xen18 kernel: [ 574.584043] drbd w2022_system: receiver > (re)started > Apr 23 12:46:22 xen18 kernel: [ 574.584052] drbd w2022_system: > conn( Unconnected -> WFConnection ) > ----- > > The node1 kern.log with NVME DRBD backend: > ----- > Apr 23 10:51:18 xen18 kernel: [ 912.800847] drbd w2022_system: PingAck > did not arrive in time. > Apr 23 10:51:18 xen18 kernel: [ 912.800930] drbd w2022_system: > peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) > pdsk( UpToDate -> DUnknown ) > Apr 23 10:51:18 xen18 kernel: [ 912.807793] block drbd0: new current > UUID 2361269D52925DF1:AB440CB0842F155D:25CE81C2C028E09E:74D2255AB30D4115 > Apr 23 10:51:18 xen18 kernel: [ 912.811762] drbd w2022_system: > ack_receiver terminated > Apr 23 10:51:18 xen18 kernel: [ 912.811768] drbd w2022_system: > Terminating drbd_a_w2022_sy > Apr 23 10:51:18 xen18 kernel: [ 912.853400] drbd w2022_system: > Connection closed > Apr 23 10:51:18 xen18 kernel: [ 912.853723] drbd w2022_system: > conn( NetworkFailure -> Unconnected ) > Apr 23 10:51:18 xen18 kernel: [ 912.853727] drbd w2022_system: receiver > terminated > Apr 23 10:51:18 xen18 kernel: [ 912.853729] drbd w2022_system: > Restarting receiver thread > Apr 23 10:51:18 xen18 kernel: [ 912.853732] drbd w2022_system: receiver > (re)started > Apr 23 10:51:18 xen18 kernel: [ 912.853740] drbd w2022_system: > conn( Unconnected -> WFConnection ) > ----- > > Seems the Domu can't write back to the DRBD because after i destroy the > Domu (no other sollution to exit), i got the following error: > libxl: error: libxl_exec.c:117:libxl_report_child_exitstatus: /etc/xen/ > scripts/block-drbd remove [1380] exited with error status 1 > libxl: error: libxl_device.c:1259:device_hotplug_child_death_cb: > script: /etc/xen/scripts/block-drbd failed; error detected. > > The domu can't release the DRBD, and looks cannot release in the xenstore: > root@xen18:~# xl list > Name ID Mem VCPUs State Time(s) > Domain-0 0 4096 4 r----- 144.7 > (null) 1 0 16 --p--d 387.5 > > I try to man many test with different DRBD config, but no luck. > Sometimes the windows survive the disconnection, but if reconnecting the > secondary thats freeze like disconnect. > > I didn't have problem if the: > - Domu OS is Linux with same config. > - XEN PV (VBD) driver no installed to Domu. > > The latest (unsigned) or any other windpws driver have any debug options? > > Thank you for your help. > Best Regards: > > Attila > > Default builds of Windows PV drivers will send their log data over xen_platform_log, you can get them from your QEMU log. Best regards, Ngoc Tu Dinh | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |