[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] [PATCH net-next] xen-netfront: Add support for IPv6 offloads

To: Paul Durrant <paul.durrant@xxxxxxxxxx>
From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Date: Thu, 7 Nov 2013 10:48:40 +0000
Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx
Delivery-date: Thu, 07 Nov 2013 11:50:47 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>
On Tue, 2013-11-05 at 11:59 +0000, Paul Durrant wrote:
> This patch adds support for IPv6 checksum offload and GSO when those
> fetaures are available in the backend.
> 
> Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> Cc: David Vrabel <david.vrabel@xxxxxxxxxx>
> ---
>  drivers/net/xen-netfront.c |  263 
> ++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 231 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index dd1011e..27212d8 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -616,7 +616,9 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
> net_device *dev)
>               tx->flags |= XEN_NETTXF_extra_info;
>  
>               gso->u.gso.size = skb_shinfo(skb)->gso_size;
> -             gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
> +             gso->u.gso.type = (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) ?
> +                               XEN_NETIF_GSO_TYPE_TCPV6 :
> +                               XEN_NETIF_GSO_TYPE_TCPV4;
>               gso->u.gso.pad = 0;
>               gso->u.gso.features = 0;
>  
> @@ -808,15 +810,18 @@ static int xennet_set_skb_gso(struct sk_buff *skb,
>               return -EINVAL;
>       }
>  
> -     /* Currently only TCPv4 S.O. is supported. */
> -     if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) {
> +     if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4 &&
> +         gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV6) {
>               if (net_ratelimit())
>                       pr_warn("Bad GSO type %d\n", gso->u.gso.type);
>               return -EINVAL;
>       }
>  
>       skb_shinfo(skb)->gso_size = gso->u.gso.size;
> -     skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
> +     skb_shinfo(skb)->gso_type =
> +             (gso->u.gso.type == XEN_NETIF_GSO_TYPE_TCPV4) ?
> +             SKB_GSO_TCPV4 :
> +             SKB_GSO_TCPV6;
>  
>       /* Header must be checked, and gso_segs computed. */
>       skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
> @@ -856,62 +861,72 @@ static RING_IDX xennet_fill_frags(struct netfront_info 
> *np,
>       return cons;
>  }
>  
> -static int checksum_setup(struct net_device *dev, struct sk_buff *skb)
> +static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
>  {
> -     struct iphdr *iph;
> -     int err = -EPROTO;
> -     int recalculate_partial_csum = 0;
> -
> -     /*
> -      * A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
> -      * peers can fail to set NETRXF_csum_blank when sending a GSO
> -      * frame. In this case force the SKB to CHECKSUM_PARTIAL and
> -      * recalculate the partial checksum.
> -      */
> -     if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
> -             struct netfront_info *np = netdev_priv(dev);
> -             np->rx_gso_checksum_fixup++;
> -             skb->ip_summed = CHECKSUM_PARTIAL;
> -             recalculate_partial_csum = 1;
> +     if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
> +             /* If we need to pullup then pullup to the max, so we
> +              * won't need to do it again.
> +              */
> +             int target = min_t(int, skb->len, MAX_TCP_HEADER);
> +             __pskb_pull_tail(skb, target - skb_headlen(skb));

Should the functions "len" argument not be involved somewhere in this?
What if it is bigger than MAX_TCP_HEADER for example?

>       }
> +}
>  
> -     /* A non-CHECKSUM_PARTIAL SKB does not require setup. */
> -     if (skb->ip_summed != CHECKSUM_PARTIAL)
> -             return 0;
> +static int checksum_setup_ip(struct sk_buff *skb, int 
> recalculate_partial_csum)
> +{

> +     struct iphdr *iph = (void *)skb->data;
> +     unsigned int header_size;
> +     unsigned int off;
> +     int err = -EPROTO;
>  
> -     if (skb->protocol != htons(ETH_P_IP))
> -             goto out;
> +     off = sizeof(struct iphdr);
>  
> -     iph = (void *)skb->data;
> +     header_size = skb->network_header + off + MAX_IPOPTLEN;
> +     maybe_pull_tail(skb, header_size);

You never reuse header_size other than setting it and immediately
passing it to maybe_pull_tail, it could be inlined into the function
call?

(I found it confusing because you reset it in the recalculate_partial
rather than extending it which is what I expected)

maybe_pull_tail doesn't guarantee to pull up to the length of the given
parameter, e.g. if it is more then skb->len. Which I think means you
need some other explicit skb len checks around the place, don't you?
Otherwise you risk running past the end of the skb for a malformed
frame.

> +
> +     off = iph->ihl * 4;
>  
>       switch (iph->protocol) {
>       case IPPROTO_TCP:
> -             if (!skb_partial_csum_set(skb, 4 * iph->ihl,
> +             if (!skb_partial_csum_set(skb, off,
>                                         offsetof(struct tcphdr, check)))
>                       goto out;
>  
>               if (recalculate_partial_csum) {
>                       struct tcphdr *tcph = tcp_hdr(skb);
> +
> +                     header_size = skb->network_header +
> +                             off +
> +                             sizeof(struct tcphdr);
> +                     maybe_pull_tail(skb, header_size);
> +
>                       tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
> -                                                      skb->len - iph->ihl*4,
> +                                                      skb->len - off,
>                                                        IPPROTO_TCP, 0);
>               }
>               break;
>       case IPPROTO_UDP:
> -             if (!skb_partial_csum_set(skb, 4 * iph->ihl,
> +             if (!skb_partial_csum_set(skb, off,
>                                         offsetof(struct udphdr, check)))
>                       goto out;
>  
>               if (recalculate_partial_csum) {
>                       struct udphdr *udph = udp_hdr(skb);
> +
> +                     header_size = skb->network_header +
> +                             off +
> +                             sizeof(struct udphdr);
> +                     maybe_pull_tail(skb, header_size);
> +
>                       udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
> -                                                      skb->len - iph->ihl*4,
> +                                                      skb->len - off,
>                                                        IPPROTO_UDP, 0);
>               }
>               break;
>       default:
>               if (net_ratelimit())
> -                     pr_err("Attempting to checksum a non-TCP/UDP packet, 
> dropping a protocol %d packet\n",
> +                     pr_err("Attempting to checksum a non-TCP/UDP packet, "
> +                            "dropping a protocol %d packet\n",

I think Linux's coding style explicitly relaxes the 80-column limit for
string literals.

>                              iph->protocol);
>               goto out;
>       }
> @@ -922,6 +937,158 @@ out:
>       return err;
>  }
>  
> +static int checksum_setup_ipv6(struct sk_buff *skb, int 
> recalculate_partial_csum)
> +{
> +     int err = -EPROTO;
> +     struct ipv6hdr *ipv6h = (void *)skb->data;
> +     u8 nexthdr;
> +     unsigned int header_size;
> +     unsigned int off;
> +     bool fragment;
> +     bool done;
> +
> +     done = false;
> +
> +     off = sizeof(struct ipv6hdr);
> +
> +     header_size = skb->network_header + off;
> +     maybe_pull_tail(skb, header_size);

Same comment about skb lengths?

> +
> +     nexthdr = ipv6h->nexthdr;
> +
> +     while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len)) &&
> +            !done) {
> +             switch (nexthdr) {
> +             case IPPROTO_DSTOPTS:
> +             case IPPROTO_HOPOPTS:
> +             case IPPROTO_ROUTING: {
> +                     struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> +
> +                     header_size = skb->network_header +
> +                             off +
> +                             sizeof(struct ipv6_opt_hdr);
> +                     maybe_pull_tail(skb, header_size);
> +
> +                     nexthdr = hp->nexthdr;
> +                     off += ipv6_optlen(hp);
> +                     break;
> +             }
> +             case IPPROTO_AH: {
> +                     struct ip_auth_hdr *hp = (void *)(skb->data + off);
> +
> +                     header_size = skb->network_header +
> +                             off +
> +                             sizeof(struct ip_auth_hdr);
> +                     maybe_pull_tail(skb, header_size);
> +
> +                     nexthdr = hp->nexthdr;
> +                     off += (hp->hdrlen+2)<<2;

I wonder if the core header would welcome a ipv6_ahlen cf optlen?

> +                     break;
> +             }
> +             case IPPROTO_FRAGMENT:
> +                     fragment = true;
> +                     /* fall through */
> +             default:
> +                     done = true;
> +                     break;
> +             }
> +     }
> +
> +     if (!done) {
> +             if (net_ratelimit())
> +                     pr_err("Failed to parse packet header\n");

Is it a requirement of the protocol that the list of IPPROTO_* we handle
must be followed by some other header? Do we really care or is it more
the upper stacks problem? Presumably it can get similarly malformed
packets off the wire?

Also you can use net_err_ratelimited.

> +             goto out;
> +     }
> +
> +     if (fragment) {
> +             if (net_ratelimit())
> +                     pr_err("Packet is a fragment!\n");

Is that a bad thing?

> +             goto out;
> +     }
> +
> +     switch (nexthdr) {
> +     case IPPROTO_TCP:
> +             if (!skb_partial_csum_set(skb, off,
> +                                       offsetof(struct tcphdr, check)))
> +                     goto out;
> +
> +             if (recalculate_partial_csum) {
> +                     struct tcphdr *tcph = tcp_hdr(skb);
> +
> +                     header_size = skb->network_header +
> +                             off +
> +                             sizeof(struct tcphdr);
> +                     maybe_pull_tail(skb, header_size);
> +
> +                     tcph->check = ~csum_ipv6_magic(&ipv6h->saddr,
> +                                                    &ipv6h->daddr,
> +                                                    skb->len - off,
> +                                                    IPPROTO_TCP, 0);
> +             }
> +             break;
> +     case IPPROTO_UDP:
> +             if (!skb_partial_csum_set(skb, off,
> +                                       offsetof(struct udphdr, check)))
> +                     goto out;
> +
> +             if (recalculate_partial_csum) {
> +                     struct udphdr *udph = udp_hdr(skb);
> +
> +                     header_size = skb->network_header +
> +                             off +
> +                             sizeof(struct udphdr);
> +                     maybe_pull_tail(skb, header_size);
> +
> +                     udph->check = ~csum_ipv6_magic(&ipv6h->saddr,
> +                                                    &ipv6h->daddr,
> +                                                    skb->len - off,
> +                                                    IPPROTO_UDP, 0);
> +             }
> +             break;
> +     default:
> +             if (net_ratelimit())
> +                     pr_err("Attempting to checksum a non-TCP/UDP packet, "
> +                            "dropping a protocol %d packet\n",
> +                            nexthdr);
> +             goto out;
> +     }
> +
> +     err = 0;
> +
> +out:
> +     return err;
> +}
> +
> +static int checksum_setup(struct net_device *dev, struct sk_buff *skb)
> +{
> +     int err = -EPROTO;
> +     int recalculate_partial_csum = 0;
> +
> +     /*
> +      * A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
> +      * peers can fail to set NETRXF_csum_blank when sending a GSO
> +      * frame. In this case force the SKB to CHECKSUM_PARTIAL and
> +      * recalculate the partial checksum.
> +      */
> +     if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {

Is it worth making this ipv4 only. Assuming that anything which adds v6
support has already fixed this bug? That would save time in
checksum_setup_ipv6.

Speaking of which can checksum_setup_* be shared with netfront somehow?
Or maybe even put in common code? (Perhaps as a future cleanup)

> +             struct netfront_info *np = netdev_priv(dev);
> +             np->rx_gso_checksum_fixup++;
> +             skb->ip_summed = CHECKSUM_PARTIAL;
> +             recalculate_partial_csum = 1;
> +     }
> +
> +     /* A non-CHECKSUM_PARTIAL SKB does not require setup. */
> +     if (skb->ip_summed != CHECKSUM_PARTIAL)
> +             return 0;
> +
> +     if (skb->protocol == htons(ETH_P_IP))
> +             err = checksum_setup_ip(skb, recalculate_partial_csum);
> +     else if (skb->protocol == htons(ETH_P_IPV6))
> +             err = checksum_setup_ipv6(skb, recalculate_partial_csum);
> +
> +     return err;
> +}
> +
>  static int handle_incoming_queue(struct net_device *dev,
[...]


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
Follow-Ups:
- Re: [Xen-devel] [PATCH net-next] xen-netfront: Add support for IPv6 offloads
  - From: Paul Durrant
References:
- [Xen-devel] [PATCH net-next] xen-netfront: Add support for IPv6 offloads
  - From: Paul Durrant
Prev by Date: [Xen-devel] [PATCH] VMX: don't crash processing 'd' debug key
Next by Date: Re: [Xen-devel] [PATCH v14 03/17] pvh prep: code motion
Previous by thread: [Xen-devel] [PATCH net-next] xen-netfront: Add support for IPv6 offloads
Next by thread: Re: [Xen-devel] [PATCH net-next] xen-netfront: Add support for IPv6 offloads
Index(es):
- Date
- Thread
Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.