[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH net-next] xen-netfront: Add support for IPv6 offloads



On Fri, 2013-11-15 at 11:19 +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel-
> > bounces@xxxxxxxxxxxxx] On Behalf Of Paul Durrant
> > Sent: 07 November 2013 11:02
> > To: Ian Campbell
> > Cc: Boris Ostrovsky; David Vrabel; xen-devel@xxxxxxxxxxxxx
> > Subject: Re: [Xen-devel] [PATCH net-next] xen-netfront: Add support for
> > IPv6 offloads
> > 
> > > -----Original Message-----
> > > From: Ian Campbell
> > > Sent: 07 November 2013 10:49
> > > To: Paul Durrant
> > > Cc: xen-devel@xxxxxxxxxxxxx; Boris Ostrovsky; David Vrabel
> > > Subject: Re: [Xen-devel] [PATCH net-next] xen-netfront: Add support for
> > > IPv6 offloads
> > >
> > > On Tue, 2013-11-05 at 11:59 +0000, Paul Durrant wrote:
> > > > This patch adds support for IPv6 checksum offload and GSO when those
> > > > fetaures are available in the backend.
> > > >
> > > > Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
> > > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > > > Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> > > > Cc: David Vrabel <david.vrabel@xxxxxxxxxx>
> > > > ---
> > > >  drivers/net/xen-netfront.c |  263
> > > ++++++++++++++++++++++++++++++++++++++------
> > > >  1 file changed, 231 insertions(+), 32 deletions(-)
> > > >
> > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > index dd1011e..27212d8 100644
> > > > --- a/drivers/net/xen-netfront.c
> > > > +++ b/drivers/net/xen-netfront.c
> > > > @@ -616,7 +616,9 @@ static int xennet_start_xmit(struct sk_buff *skb,
> > > struct net_device *dev)
> > > >                 tx->flags |= XEN_NETTXF_extra_info;
> > > >
> > > >                 gso->u.gso.size = skb_shinfo(skb)->gso_size;
> > > > -               gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
> > > > +               gso->u.gso.type = (skb_shinfo(skb)->gso_type &
> > > SKB_GSO_TCPV6) ?
> > > > +                                 XEN_NETIF_GSO_TYPE_TCPV6 :
> > > > +                                 XEN_NETIF_GSO_TYPE_TCPV4;
> > > >                 gso->u.gso.pad = 0;
> > > >                 gso->u.gso.features = 0;
> > > >
> > > > @@ -808,15 +810,18 @@ static int xennet_set_skb_gso(struct sk_buff
> > > *skb,
> > > >                 return -EINVAL;
> > > >         }
> > > >
> > > > -       /* Currently only TCPv4 S.O. is supported. */
> > > > -       if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) {
> > > > +       if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4 &&
> > > > +           gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV6) {
> > > >                 if (net_ratelimit())
> > > >                         pr_warn("Bad GSO type %d\n", gso->u.gso.type);
> > > >                 return -EINVAL;
> > > >         }
> > > >
> > > >         skb_shinfo(skb)->gso_size = gso->u.gso.size;
> > > > -       skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
> > > > +       skb_shinfo(skb)->gso_type =
> > > > +               (gso->u.gso.type == XEN_NETIF_GSO_TYPE_TCPV4) ?
> > > > +               SKB_GSO_TCPV4 :
> > > > +               SKB_GSO_TCPV6;
> > > >
> > > >         /* Header must be checked, and gso_segs computed. */
> > > >         skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
> > > > @@ -856,62 +861,72 @@ static RING_IDX xennet_fill_frags(struct
> > > netfront_info *np,
> > > >         return cons;
> > > >  }
> > > >
> > > > -static int checksum_setup(struct net_device *dev, struct sk_buff *skb)
> > > > +static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int 
> > > > len)
> > > >  {
> > > > -       struct iphdr *iph;
> > > > -       int err = -EPROTO;
> > > > -       int recalculate_partial_csum = 0;
> > > > -
> > > > -       /*
> > > > -        * A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
> > > > -        * peers can fail to set NETRXF_csum_blank when sending a GSO
> > > > -        * frame. In this case force the SKB to CHECKSUM_PARTIAL and
> > > > -        * recalculate the partial checksum.
> > > > -        */
> > > > -       if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
> > > > -               struct netfront_info *np = netdev_priv(dev);
> > > > -               np->rx_gso_checksum_fixup++;
> > > > -               skb->ip_summed = CHECKSUM_PARTIAL;
> > > > -               recalculate_partial_csum = 1;
> > > > +       if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
> > > > +               /* If we need to pullup then pullup to the max, so we
> > > > +                * won't need to do it again.
> > > > +                */
> > > > +               int target = min_t(int, skb->len, MAX_TCP_HEADER);
> > > > +               __pskb_pull_tail(skb, target - skb_headlen(skb));
> > >
> > > Should the functions "len" argument not be involved somewhere in this?
> > > What if it is bigger than MAX_TCP_HEADER for example?
> > 
> > Hmm, good point.
> > 
> 
> Actually, now I look again, this is correct. The if statement checks
> len to see if a pullup is necessary, but if one is necessary then it
> always pulls up to the max.

So it is guaranteed that len <= MAX_TCP_HEADER in all circumstances?
That might surprise some callers, since the name doesn't really indicate
this has anything to do with TCP. How about passing max in as an
argument, where all callers would currently pass MAX_TCP_HEADER?

>  I still need to add a failure return to indicate the pullup did not
> achieve the desired length though.
> 
>   Paul
> 
> > >
> > > >         }
> > > > +}
> > > >
> > > > -       /* A non-CHECKSUM_PARTIAL SKB does not require setup. */
> > > > -       if (skb->ip_summed != CHECKSUM_PARTIAL)
> > > > -               return 0;
> > > > +static int checksum_setup_ip(struct sk_buff *skb, int
> > > recalculate_partial_csum)
> > > > +{
> > >
> > > > +       struct iphdr *iph = (void *)skb->data;
> > > > +       unsigned int header_size;
> > > > +       unsigned int off;
> > > > +       int err = -EPROTO;
> > > >
> > > > -       if (skb->protocol != htons(ETH_P_IP))
> > > > -               goto out;
> > > > +       off = sizeof(struct iphdr);
> > > >
> > > > -       iph = (void *)skb->data;
> > > > +       header_size = skb->network_header + off + MAX_IPOPTLEN;
> > > > +       maybe_pull_tail(skb, header_size);
> > >
> > > You never reuse header_size other than setting it and immediately
> > > passing it to maybe_pull_tail, it could be inlined into the function
> > > call?
> > >
> > 
> > Yes, it could but I thought this was tidier. It's also the same as the code 
> > in
> > netback.
> > 
> > > (I found it confusing because you reset it in the recalculate_partial
> > > rather than extending it which is what I expected)
> > >
> > > maybe_pull_tail doesn't guarantee to pull up to the length of the given
> > > parameter, e.g. if it is more then skb->len. Which I think means you
> > > need some other explicit skb len checks around the place, don't you?
> > > Otherwise you risk running past the end of the skb for a malformed
> > > frame.
> > >
> > 
> > Yes, that's true. Probably best to have maybe_pull_tail() return a failure 
> > if it
> > doesn't managed to pullup the requested amount then.
> > 
> > > > +
> > > > +       off = iph->ihl * 4;
> > > >
> > > >         switch (iph->protocol) {
> > > >         case IPPROTO_TCP:
> > > > -               if (!skb_partial_csum_set(skb, 4 * iph->ihl,
> > > > +               if (!skb_partial_csum_set(skb, off,
> > > >                                           offsetof(struct tcphdr, 
> > > > check)))
> > > >                         goto out;
> > > >
> > > >                 if (recalculate_partial_csum) {
> > > >                         struct tcphdr *tcph = tcp_hdr(skb);
> > > > +
> > > > +                       header_size = skb->network_header +
> > > > +                               off +
> > > > +                               sizeof(struct tcphdr);
> > > > +                       maybe_pull_tail(skb, header_size);
> > > > +
> > > >                         tcph->check = ~csum_tcpudp_magic(iph->saddr, 
> > > > iph-
> > > >daddr,
> > > > -                                                        skb->len - 
> > > > iph->ihl*4,
> > > > +                                                        skb->len - off,
> > > >                                                          IPPROTO_TCP, 
> > > > 0);
> > > >                 }
> > > >                 break;
> > > >         case IPPROTO_UDP:
> > > > -               if (!skb_partial_csum_set(skb, 4 * iph->ihl,
> > > > +               if (!skb_partial_csum_set(skb, off,
> > > >                                           offsetof(struct udphdr, 
> > > > check)))
> > > >                         goto out;
> > > >
> > > >                 if (recalculate_partial_csum) {
> > > >                         struct udphdr *udph = udp_hdr(skb);
> > > > +
> > > > +                       header_size = skb->network_header +
> > > > +                               off +
> > > > +                               sizeof(struct udphdr);
> > > > +                       maybe_pull_tail(skb, header_size);
> > > > +
> > > >                         udph->check = ~csum_tcpudp_magic(iph->saddr,
> > > iph->daddr,
> > > > -                                                        skb->len - 
> > > > iph->ihl*4,
> > > > +                                                        skb->len - off,
> > > >                                                          IPPROTO_UDP, 
> > > > 0);
> > > >                 }
> > > >                 break;
> > > >         default:
> > > >                 if (net_ratelimit())
> > > > -                       pr_err("Attempting to checksum a non-TCP/UDP
> > > packet, dropping a protocol %d packet\n",
> > > > +                       pr_err("Attempting to checksum a non-TCP/UDP
> > > packet, "
> > > > +                              "dropping a protocol %d packet\n",
> > >
> > > I think Linux's coding style explicitly relaxes the 80-column limit for
> > > string literals.
> > >
> > 
> > Ok. Good to know.
> > 
> > > >                                iph->protocol);
> > > >                 goto out;
> > > >         }
> > > > @@ -922,6 +937,158 @@ out:
> > > >         return err;
> > > >  }
> > > >
> > > > +static int checksum_setup_ipv6(struct sk_buff *skb, int
> > > recalculate_partial_csum)
> > > > +{
> > > > +       int err = -EPROTO;
> > > > +       struct ipv6hdr *ipv6h = (void *)skb->data;
> > > > +       u8 nexthdr;
> > > > +       unsigned int header_size;
> > > > +       unsigned int off;
> > > > +       bool fragment;
> > > > +       bool done;
> > > > +
> > > > +       done = false;
> > > > +
> > > > +       off = sizeof(struct ipv6hdr);
> > > > +
> > > > +       header_size = skb->network_header + off;
> > > > +       maybe_pull_tail(skb, header_size);
> > >
> > > Same comment about skb lengths?
> > >
> > > > +
> > > > +       nexthdr = ipv6h->nexthdr;
> > > > +
> > > > +       while ((off <= sizeof(struct ipv6hdr) + 
> > > > ntohs(ipv6h->payload_len))
> > > &&
> > > > +              !done) {
> > > > +               switch (nexthdr) {
> > > > +               case IPPROTO_DSTOPTS:
> > > > +               case IPPROTO_HOPOPTS:
> > > > +               case IPPROTO_ROUTING: {
> > > > +                       struct ipv6_opt_hdr *hp = (void *)(skb->data + 
> > > > off);
> > > > +
> > > > +                       header_size = skb->network_header +
> > > > +                               off +
> > > > +                               sizeof(struct ipv6_opt_hdr);
> > > > +                       maybe_pull_tail(skb, header_size);
> > > > +
> > > > +                       nexthdr = hp->nexthdr;
> > > > +                       off += ipv6_optlen(hp);
> > > > +                       break;
> > > > +               }
> > > > +               case IPPROTO_AH: {
> > > > +                       struct ip_auth_hdr *hp = (void *)(skb->data + 
> > > > off);
> > > > +
> > > > +                       header_size = skb->network_header +
> > > > +                               off +
> > > > +                               sizeof(struct ip_auth_hdr);
> > > > +                       maybe_pull_tail(skb, header_size);
> > > > +
> > > > +                       nexthdr = hp->nexthdr;
> > > > +                       off += (hp->hdrlen+2)<<2;
> > >
> > > I wonder if the core header would welcome a ipv6_ahlen cf optlen?
> > >
> > 
> > Yes, it would be nicer wouldn't it.
> > 
> > > > +                       break;
> > > > +               }
> > > > +               case IPPROTO_FRAGMENT:
> > > > +                       fragment = true;
> > > > +                       /* fall through */
> > > > +               default:
> > > > +                       done = true;
> > > > +                       break;
> > > > +               }
> > > > +       }
> > > > +
> > > > +       if (!done) {
> > > > +               if (net_ratelimit())
> > > > +                       pr_err("Failed to parse packet header\n");
> > >
> > > Is it a requirement of the protocol that the list of IPPROTO_* we handle
> > > must be followed by some other header? Do we really care or is it more
> > > the upper stacks problem? Presumably it can get similarly malformed
> > > packets off the wire?
> > 
> > I'd have to check but I'm pretty sure none of the protos we handle can be
> > terminating so if done is not set then the packet is malformed.
> > 
> > >
> > > Also you can use net_err_ratelimited.
> > 
> > Ok.
> > 
> > >
> > > > +               goto out;
> > > > +       }
> > > > +
> > > > +       if (fragment) {
> > > > +               if (net_ratelimit())
> > > > +                       pr_err("Packet is a fragment!\n");
> > >
> > > Is that a bad thing?
> > >
> > 
> > Well, you can't do TCP or UDP checksum offload on a fragment, can you?
> > 
> > > > +               goto out;
> > > > +       }
> > > > +
> > > > +       switch (nexthdr) {
> > > > +       case IPPROTO_TCP:
> > > > +               if (!skb_partial_csum_set(skb, off,
> > > > +                                         offsetof(struct tcphdr, 
> > > > check)))
> > > > +                       goto out;
> > > > +
> > > > +               if (recalculate_partial_csum) {
> > > > +                       struct tcphdr *tcph = tcp_hdr(skb);
> > > > +
> > > > +                       header_size = skb->network_header +
> > > > +                               off +
> > > > +                               sizeof(struct tcphdr);
> > > > +                       maybe_pull_tail(skb, header_size);
> > > > +
> > > > +                       tcph->check = ~csum_ipv6_magic(&ipv6h->saddr,
> > > > +                                                      &ipv6h->daddr,
> > > > +                                                      skb->len - off,
> > > > +                                                      IPPROTO_TCP, 0);
> > > > +               }
> > > > +               break;
> > > > +       case IPPROTO_UDP:
> > > > +               if (!skb_partial_csum_set(skb, off,
> > > > +                                         offsetof(struct udphdr, 
> > > > check)))
> > > > +                       goto out;
> > > > +
> > > > +               if (recalculate_partial_csum) {
> > > > +                       struct udphdr *udph = udp_hdr(skb);
> > > > +
> > > > +                       header_size = skb->network_header +
> > > > +                               off +
> > > > +                               sizeof(struct udphdr);
> > > > +                       maybe_pull_tail(skb, header_size);
> > > > +
> > > > +                       udph->check = ~csum_ipv6_magic(&ipv6h->saddr,
> > > > +                                                      &ipv6h->daddr,
> > > > +                                                      skb->len - off,
> > > > +                                                      IPPROTO_UDP, 0);
> > > > +               }
> > > > +               break;
> > > > +       default:
> > > > +               if (net_ratelimit())
> > > > +                       pr_err("Attempting to checksum a non-TCP/UDP
> > > packet, "
> > > > +                              "dropping a protocol %d packet\n",
> > > > +                              nexthdr);
> > > > +               goto out;
> > > > +       }
> > > > +
> > > > +       err = 0;
> > > > +
> > > > +out:
> > > > +       return err;
> > > > +}
> > > > +
> > > > +static int checksum_setup(struct net_device *dev, struct sk_buff *skb)
> > > > +{
> > > > +       int err = -EPROTO;
> > > > +       int recalculate_partial_csum = 0;
> > > > +
> > > > +       /*
> > > > +        * A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
> > > > +        * peers can fail to set NETRXF_csum_blank when sending a GSO
> > > > +        * frame. In this case force the SKB to CHECKSUM_PARTIAL and
> > > > +        * recalculate the partial checksum.
> > > > +        */
> > > > +       if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
> > >
> > > Is it worth making this ipv4 only. Assuming that anything which adds v6
> > > support has already fixed this bug? That would save time in
> > > checksum_setup_ipv6.
> > >
> > 
> > That's true.
> > 
> > > Speaking of which can checksum_setup_* be shared with netfront
> > > somehow?
> > > Or maybe even put in common code? (Perhaps as a future cleanup)
> > >
> > 
> > I think it really should be in common code, but I think that would be better
> > handled as a separate patch to remove the duplication.
> > 
> > Thanks for the comprehensive review!
> > 
> >   Paul
> > 
> > > > +               struct netfront_info *np = netdev_priv(dev);
> > > > +               np->rx_gso_checksum_fixup++;
> > > > +               skb->ip_summed = CHECKSUM_PARTIAL;
> > > > +               recalculate_partial_csum = 1;
> > > > +       }
> > > > +
> > > > +       /* A non-CHECKSUM_PARTIAL SKB does not require setup. */
> > > > +       if (skb->ip_summed != CHECKSUM_PARTIAL)
> > > > +               return 0;
> > > > +
> > > > +       if (skb->protocol == htons(ETH_P_IP))
> > > > +               err = checksum_setup_ip(skb, recalculate_partial_csum);
> > > > +       else if (skb->protocol == htons(ETH_P_IPV6))
> > > > +               err = checksum_setup_ipv6(skb, 
> > > > recalculate_partial_csum);
> > > > +
> > > > +       return err;
> > > > +}
> > > > +
> > > >  static int handle_incoming_queue(struct net_device *dev,
> > > [...]
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxx
> > http://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.