On Wed, Jul 02, 2008 at 07:53:27PM +0900, Isaku Yamahata wrote:
> On Wed, Jul 02, 2008 at 04:20:33PM +1000, Simon Horman wrote:
>
> > I have done some more investigations and it does really
> > seem that calling ia64_sal_get_state_info() via ia64_log_queue()
> > in ia64_mca_cpe_int_caller() causes the hypervisor to lock
> > up when my EFI RR patches are applied.
> >
> > As you point out, if xmalloc() was ever called by ia64_log_queue()
> > in this context then a BUG would be triggered. As we are not
> > seeing that in the wild, then that case must not occur (or occur
> > so rarely that no one has seen and reported it yet). This means
> > that ia64_sal_get_state_info() must be returning zero.
> >
> > If I understand correctly, ia64_log_queue() does more or less nothing
> > if ia64_sal_get_state_info() returns zero. Or in other words, if
> > ia64_sal_get_state_info() then for one reason or another there is no
> > information available at that time - we know that because if
> > there was information available then xmalloc() would be called and
> > a BUG would be triggered.
> >
> >
> > Given that without the EFF RR patches the call to ia64_log_queue()
> > in ia64_sal_get_state_info() seems to do nothing and with the call
> > a crash occurs, I wonder if the best way forward is to simply
> > remove the call.
> >
> > The section on SAL_GET_STATE (==ia64_sal_get_state_info()) in the System
> > Abstraction Layer Specification (Dec 2003) does state "In response to
> > the MCA, Processor CMC, or Corrected Platform event, The operating
> > system must call the procedure to obtain all the pending processor and
> > plaftorm error information that triggerd the event."
> >
> > Does that apply to situations when ia64_mca_cpe_int_caller() is called?
> > And if so, can calling ia64_log_queue() be deffered?
>
> ia64_mca_cpe_int_caller() is triggered by the polling timer,
> cpe_poll_timer which send IA64_CPEP_VECTOR. So I think
> ia64_log_queue() can be deferred by using softirq or tasklet.
>
> To be honest, taking a rough look at SAL specification I don't
> understand why the VMM locks up when ia64_sal_get_state_info() is called.
> You stated that when ia64_log_queue() is called, RID is already
> EFI's. Have you tracked down the reason and what's firmware
> call(PAL/SAL/EFI)?
I think that it varies, but I will check my logs.
> And where have you tracked down the hypervisor locks up?
> i.e. The hypervisor locks up in ia64_sal_get_state_info() around
> SAL call or right in the SAL call.
It appears to lock up right in the SAL call.
> If the lock up happens in the SAL call, what we can do is to take
> a closer look at SAL spec and to make the calling condition sure.
> If the lock up happens before or after the SAL call, presumably
> it sould indicate xen/ia64 vmm bug.
Ok, I will look through the speficication (the new one at the link
you set in your next email) and see if I can find anything.
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|