[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Wg-test-framework] baroque1 hardware problem
Ian Jackson writes ("Minutes All-Net synchronisation call, 20th May"): > ACTION: Ian to send technical details, so that All-net can raise > with supplier (Intel). So, the problem is as follows: Summary: -------- Sometimes, when powered on, baroque1 does not come up. Symptoms are that the serial control lines do change (visible in sympathy log), but no text is printed on the serial console. Waiting a long time (up to at least ten minutes) has no effect. Sending "return" on the serial console elicits no response. After the problem has occurred, often more than one further attempt to power cycle the machine is required to get it to work again. After that it works normally until the fault recurs. Repro method: ------------- * Write a pxeboot file which refers to a stock Wheey amd64 debian-installer netboot image, and specifies a preseed file. * Power off (via the PDU). * Wait 30s. * Power on (via the PDU). * Monitor the preseed file http server access log waiting for the pressed file to be downloaded. * When the preseed file has been fetched, declare "success". Then run round for the next repetition. (Generally, this means that the server is powered off in the middle of one of debian-installer's software-fetching steps.) Alternatively, after 350s, declare "failure" and stop. Statistical information: ------------------------ * My records show failures after the following number of repetitions: 96 (not quite sure about this - data collection was affected by an unrelated network problem on my workstation) 29, 25, 26. * My records show the following number of attempts needed to get the machine to work at all, again: 1, 3, 2, 3 * I have run the same test on baroque0. It has managed (at least) 400 consecutive power cycle restarts without problem. Handover: --------- I hereby hand both baroque0 and baroque1 over to you. (It seems most sensible to give you the working machine too, for comparison.) The current setup in the colo is the PXE configuration as described above. So I think it should be possible to reproduce the problem as follows: - power baroque1 off - wait 30s - power baroque1 on - wait for it to show life on the serial console - wait for it to show entry into debian-installer (eg wait for "Setting up the clock" to be printed on the serial console), then declare success and go round again I have disconnected the serial consoles of both machines from sympathy, so you should be able to connect to them with picocom or expect on /dev/ttyRP5 and /dev/ttyRP6. NB that I am now going to be away until next Wednesday morning. Ian. _______________________________________________ Wg-test-framework mailing list Wg-test-framework@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-test-framework
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |