[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen PVM: Strange lockups when running PostgreSQL load



On 17.10.2012 17:35, Andrew Cooper wrote:

>> (XEN) Event channel information for domain 1:
>> (XEN) Polling vCPUs: {1,4,6}
>> (XEN) port [p/m]
>> (XEN) 4 [1/1]: s=6 n=0 x=0
>> (XEN) 10 [0/1]: s=6 n=1 x=0
>> (XEN) 28 [0/1]: s=6 n=4 x=0
>> (XEN) 40 [0/1]: s=6 n=6 x=0
>>
> s = state.  0 = free, 1 = reserved, 2 = unbound, 3 = inter-domain, 4 =
> pirq, 5 = virq, 6 = ipi
> n = target vcpu id to notify
> x = boolean indicating whether xen is a consumer of the event channel or
> not.
> 
> d = target domain (when appropriate)  In this case, p is the target port.
> 

Thanks (at least something learned today :)) One thing I noticed here, in the
event channel info above, pending is 0 for channel 10, 28 and 40 (and set for 4
which is the spinlock ipi for cpu 0). But in the VCPU info below (another
unknown: has=T and F) it says upcall_pend for all of them. Unfortunately that
might just mean that things change...

>> (XEN) VCPU0: CPU3 [has=T] flags=0 poll=0 upcall_pend = 01, upcall_mask
> = 01
>> dirty_cpus={3} cpu_affinity={0-127}
>> (XEN) No periodic timer
>> (XEN) VCPU1: CPU7 [has=F] flags=1 poll=10 upcall_pend = 01,
> upcall_mask = 01
>> dirty_cpus={} cpu_affinity={0-127}
>> (XEN) No periodic timer

>> (XEN) VCPU4: CPU6 [has=F] flags=1 poll=28 upcall_pend = 01,
> upcall_mask = 01
>> dirty_cpus={} cpu_affinity={0-127}
>> (XEN) No periodic timer
>> (XEN) VCPU6: CPU0 [has=F] flags=1 poll=40 upcall_pend = 01,
> upcall_mask = 01
>> dirty_cpus={} cpu_affinity={0-127}
>> (XEN) No periodic timer
> 
> So in this case, vcpu 1 is in a poll, on port 10, which is an IPI event
> channel for itself.
> 
> Same for vcpu 4, except it is on port 28, and for vcpu 6 on port 60.
> 

> 
> I wonder if there is possibly a race condition between notifying that a
> lock has been unlocked, and another vcpu trying to poll after deciding
> that the lock is locked.

There has to be something somehwere, I just cannot spot it. The unlocking cpu
will do a wmb() before setting the lock to 0, then a mb() and then check for
spinners. When failing the quick pack a locker will first set the lockspinner
entry, then do a wmb() and increment the spinners count. After that it clears
the event pending and then checks lock again before actually going into poll.

> 
> The other option is that there is a bug in working out which event
> channel to notify when a lock is unlocked.

I had thought I saw one thing that I tried to fix with my patch. Another train
of thought would have been any other cpu grabbing the lock always as soon as it
gets released and so preventing any cpu in poll from success. But that would
then show the lock as locked...

> 
> ~Andrew
> 
>>
>>
>> Backtraces would be somewhat inconsistent (as always). Note, I should
> mention
>> that I still had a kernel with my patch applied on that guest. That
> changes
>> things a bit (actually it takes a bit longer to hang but again that
> might be
>> just a matter of timing). The strange lock state of 2 spinners on an
> unlocked
>> lock remains the same with or without it.
>>
>> One question about the patch actually, would anybody think that there
> could be a
>> case where the unlocking cpu has itself on the spinners list? I did
> not think so
>> but that might be wrong.
>>>
>>> The IRQ handler for the spinlock evtchn in Linux is:
>>> static irqreturn_t dummy_handler(int irq, void *dev_id)
>>> {
>>> BUG();
>>> return IRQ_HANDLED;
>>> }
>>>
>>> and right after we register it:
>>> disable_irq(irq); /* make sure it's never delivered */
>>>
>>> The is no enable -- ignoring bugs of which there have been couple of
>>> instances, but those trigger the BUG() so are pretty obvious.
>>>
>>> Ian.
>>>
>>>
>>
>>
>>
>>
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
> 


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.