WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel][PV-ops][PATCH] Netback: Fix PV network issue for netback

On Thu, 2010-07-01 at 16:47 +0100, Ian Campbell wrote:
> On Thu, 2010-07-01 at 16:29 +0100, Jeremy Fitzhardinge wrote:
> > On 07/01/2010 04:48 PM, Ian Campbell wrote:
> > > On Mon, 2010-06-21 at 12:14 +0100, Jeremy Fitzhardinge wrote:
> > >   
> > >> Subject: [PATCH] xen/netback: make sure all the group structures are 
> > >> initialized before starting async code
> > >>
> > >> Split the netbk group array initialization into two phases: one to do
> > >> simple "can't fail" initialization of lists, timers, locks, etc; and a
> > >> second pass to allocate memory and start the async code.
> > >>
> > >> This makes sure that all the basic stuff is in place by the time the 
> > >> async code
> > >> is running.
> > >>     
> > > Paul noticed a crash in netback init because...
> > >
> > >   
> > >> @@ -1764,16 +1766,6 @@ static int __init netback_init(void)
> > >>                  netbk->netbk_tx_pending_timer.function =
> > >>                          netbk_tx_pending_timeout;
> > >>  
> > >> -                netbk->mmap_pages =
> > >> -                        alloc_empty_pages_and_pagevec(MAX_PENDING_REQS);
> > >> -                if (!netbk->mmap_pages) {
> > >> -                        printk(KERN_ALERT "%s: out of memory\n", 
> > >> __func__);
> > >> -                        del_timer(&netbk->netbk_tx_pending_timer);
> > >> -                        del_timer(&netbk->net_timer);
> > >> -                        rc = -ENOMEM;
> > >> -                        goto failed_init;
> > >> -                }
> > >> -
> > >>                  for (i = 0; i < MAX_PENDING_REQS; i++) {
> > >>                          page = netbk->mmap_pages[i];
> > >>                          SetPageForeign(page, netif_page_release);
> > >>     
> > > ...this dereference of netbk->mmap_pages[i]...
> > >
> > >   
> > >> @@ -1786,6 +1778,26 @@ static int __init netback_init(void)
> > >>     
> > > [...]
> > >   
> > >> +                netbk->mmap_pages =
> > >> +                        alloc_empty_pages_and_pagevec(MAX_PENDING_REQS);
> > >>     
> > > ... happens before this initialisation of the array.
> > >   
> > 
> > Hm, I hadn't meant to commit that properly.  I had it locally and
> > accidentally pushed it out.
> > 
> > I only did that patch as an RFC in response to an issue alluded to by
> > Dongxiao (or was it you?) about things not being fully initialized by
> > the time the async code starts.  Is this a real issue, and if so, what's
> > the correct fix?
> 
> I don't think there is an actual current issue, just a potential one
> since we are relying on data structures being zeroed rather than
> properly initialised to keep the async code from running off into the
> weeds, it just seemed a little fragile this way.
> 
> Originally I said:
> >> The crash is in one of the calls to list_move_tail and I think it is
> >> because netbk->pending_inuse_head not being initialised until after
> >> the 
> >> threads and/or tasklets are created (I was running in threaded mode).
> >> Perhaps even though we are now zeroing the netbk struct those fields
> >> should still be initialised before kicking off any potentially
> >> asynchronous tasks?
> 
> this specific issue was fixed by zeroing the netbk array as it is
> allocated, I just thought we could make things more robust by not
> triggering the async code until everything was fully setup.

I suspect simply not letting the kthread off the leash until the end of
the loop may be sufficient. The tasklet case I think is safe until they
are explicitly tickled.

e.g. something like:

diff --git a/drivers/xen/netback/netback.c b/drivers/xen/netback/netback.c
index 29b60c8..36f3cc7 100644
--- a/drivers/xen/netback/netback.c
+++ b/drivers/xen/netback/netback.c
@@ -1794,7 +1794,6 @@ static int __init netback_init(void)
 
                        if (!IS_ERR(netbk->kthread.task)) {
                                kthread_bind(netbk->kthread.task, group);
-                               wake_up_process(netbk->kthread.task);
                        } else {
                                printk(KERN_ALERT
                                        "kthread_run() fails at netback\n");
@@ -1820,6 +1819,9 @@ static int __init netback_init(void)
                spin_lock_init(&netbk->net_schedule_list_lock);
 
                atomic_set(&netbk->netfront_count, 0);
+
+               if (MODPARM_netback_kthread)
+                       wake_up_process(netbk->kthread.task);
        }
 
        netbk_copy_skb_mode = NETBK_DONT_COPY_SKB;





_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel