[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] State of GPLPV tests - 28.11.11



On 29.11.2011 00:16, James Harper wrote:
I am still running tests 7 days a week on two test systems. Results are quite
discouraging though. After experiencing crash after crash I wanted to test if
the configuration I called "stable" (Xen 4.0.1, GPLPV 0.11.0.213, dom0 kernel
2.6.32.18-pvops0-ak3) was stable indeed. But even that config crashed when
running my torture test. It is stable on our production systems - running
other workloads of course.
What crash are you getting these days? Is it the same one as you used to
get?

Yes, still exactly the same crashes.

Good good news: I think I have found the bug. Since I am not really a Xen or Windows kernel developer it cannot say for sure but here is what I found:

When domU hang I ran xentop and found out that the number of vbd read requests was an number like 0x7FFFzzzz in hex which lead me to a thesis: GPLPV crashes as soon as the number of disk requests reaches 2^32. On my hardware with 5000 IIOPs/sec this is reached in
2^32 / 5000 IIOPs / 3600 sec-per-hour / 24 hours-per-day = 9.94 days
And there we go: there are the 9-10 days I was always seeing.

I studied the source code of blkback/blktap/aio and found nothing. But in GPLPV and its use of the ring macros I found suspicious code in every version of GPLPV I ever used

  while (more_to_do)
  {
    rp = xvdd->ring.sring->rsp_prod;
    KeMemoryBarrier();
    for (i = xvdd->ring.rsp_cons; i < rp; i++)
    {
      rep = XenVbd_GetResponse(xvdd, i);

If now rp is 10 for example and xvdd->ring.rsp_cons is 0xFFFFFFF7 then the for loop is skipped, responses are not delivered and we see the hang.

Regards Andreas


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.