[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 108068: regressions - FAIL



On 02/05/17 13:45, Ian Jackson wrote:
> Jan Beulich writes ("Re: [Xen-devel] [xen-unstable test] 108068: regressions 
> - FAIL"):
>> On 01.05.17 at 20:49, <osstest-admin@xxxxxxxxxxxxxx> wrote:
>> This has been recurring for the last few flights, but I wonder whether
>>
>> 2017-05-01 13:18:52 Z executing ssh ... root@172.16.144.40 readlink 
>> /dev/italia0-vg/win.guest.osstest-disk 
>> 2017-05-01 13:18:52 Z executing ssh ... root@172.16.144.40 lvdisplay --colon 
>> /dev/italia0-vg/win.guest.osstest-disk 
>> 2017-05-01 13:18:53 Z lvdisplay output says device is still open: 
>> /dev/italia0-vg/win.guest.osstest-disk:italia0-vg:3:1:-1:2:20480000:2500:-1:0:-1:253:2
>>  
>> 2017-05-01 13:18:53 Z executing ssh ... root@172.16.144.40 umount 
>> /dev/italia0-vg/win.guest.osstest-disk 
>> umount: /dev/italia0-vg/win.guest.osstest-disk: not mounted
>> 2017-05-01 13:18:53 Z command nonzero waitstatus 8192: timeout 60 ssh -o 
>> StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=100 -o 
>> ServerAliveInterval=100 -o PasswordAuthentication=no -o 
>> ChallengeResponseAuthentication=no -o 
>> UserKnownHostsFile=tmp/t.known_hosts_108068.test-amd64-i386-xl-qemut-winxpsp3-vcpus1
>>  root@172.16.144.40 umount /dev/italia0-vg/win.guest.osstest-disk 
>> status 8192 at Osstest/TestSupport.pm line 442.
>>
>> indicates an environmental problem rather than a
>> software-under-test one (the more that the single commit
>> being tested can't possibly influence host or guest behavior).
> This is almost certainly not an environmental problem.  What seems to
> be happening is that the guest shutdown/teardown is going wrong
> somehow.
>
> http://logs.test-lab.xenproject.org/osstest/logs/108068/test-amd64-i386-xl-qemut-winxpsp3-vcpus1/16.ts-guest-stop.log
>
> shows this:
>
> 2017-05-01 13:18:27 Z executing ssh ... root@172.16.144.40 xl shutdown -wF 
> win.guest.osstest 
> Shutting down domain 17
> PV control interface not available: sending ACPI power button event.
> Waiting for 1 domains
> Domain 17 has been shut down, reason code 1
> 2017-05-01 13:18:36 Z executing ssh ... root@172.16.144.40 xl list 
> 2017-05-01 13:18:36 Z guest win.guest.osstest state is psr 
>
> So the guest has been shut down in the sense that xl shutdown -w
> has exited (-w means to wait for the shutdown), but not in the sense
> that the domain has been destroyed.
>
> osstest spends 14 seconds checking that the guest doesn't respond to
> ping (this is probably a bit pointless, TBH):
>
> 2017-05-01 13:18:50 Z ping 172.16.146.243 down 
>
> Then the next step tries to start the guest.  But it finds that the
> backing block device is in use.  The command that fails is there so
> that this test script can be re-run in certain ad-hoc by-hand tests:
> it is trying to unmount the block device, on the theory that if it is
> shown as open in LVM, that is probably because it's mounted.  The
> unmount fails.
>
> The underlying problem is that the block backend still has the guest
> block device open.  Indeed, during the logs capture we see
>
> http://logs.test-lab.xenproject.org/osstest/logs/108068/test-amd64-i386-xl-qemut-winxpsp3-vcpus1/italia0-output-xl_list
>
> the guest is still there:
>
> Name                                        ID   Mem VCPUs    State   Time(s)
> Domain-0                                     0   511     4     r-----     
> 913.9
> win.guest.osstest                           18  1536     1     r-----      
> 16.0
>
> (that's at 2017-05-01 13:18:56)
>
> I think the guest that was shut down was domid 17 and this new one is
> domid 18.  This logfile
>
> http://logs.test-lab.xenproject.org/osstest/logs/108068/test-amd64-i386-xl-qemut-winxpsp3-vcpus1/italia0---var-log-xen-xl-win.guest.osstest--incoming.log
>
> shows domid 17 shutting down and then this message
>
>  Done. Rebooting now
>
> and then it seems to start the domain again.
>
> Is it possible that something has changed which means that Windows
> (sometimes?) doesn't respond to an ACPI power button event by shutting
> down, but by rebooting ?

This is known, and has definitely been discussed before on xen-devel
before (although it involved IanC last time he looked at these tests, so
a while ago now).

In an APCI view of the world, the two pieces of information you can
convey is "The user pressed the power button", and "The user pressed the
sleep button".

Windows typically defaults these to suspend and sleep, not shutdown. 
Neither suspend nor sleep are typically available to VMs (unless you
alter the apci_* defaults in the xl.cfg file).

You must explicitly change the defaults to always treat the power button
as poweroff, or install PV drivers which will intercept the PV protocol
and DTRT before the toolstack falls back to ACPI event.

Another issue which gets in the way is windows deciding to install
updates, which can result in reboots at any point when other options
have been selected.  I presume the COLO firewall should prevent all
behaviour like that?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.