[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [DRAFT v5] PV Calls protocol design document (former XenSock)
Thank you! On Sun, 21 Aug 2016, Christopher Clark wrote: > The PV Calls (formerly XenSock) protocol design has recently attracted > interest within the OpenXT software > development community, as it is a novel interdomain communication mechanism > and protocol. > > We have a longstanding and active interest in interdomain communication > transport. v4v is a protocol and mechanism > originally designed at Citrix for XenClient and XenClient XT, now in use by > OpenXT, and has been deployed in > production systems for several years at this point. It has benefitted from > previous reviews by the Xen Community, > and continued engagement from its original designers, and is our selection of > protocol and implementation to meet > the particular software requirements of our project. Members of the OpenXT > community also have interdomain > protocol experience through being involved in the development of both vchan > and the XSM architecture. > > The PV Calls (formerly XenSock) work described in this thread is interesting > and quite a different technology, > evidently with a different set of design constraints and focus on solving a > different problem: it enables the > insertion of VM boundaries and separation in-between components that are > communicating via the POSIX socket API, > which could not otherwise be so isolated from each other. > > In contrast, v4v prioritizes isolation between VMs over seamless integration > of existing components, preferring no > memory sharing between VMs, and mandatory access control enforced on > communication channels by the hypervisor. > > So while PV Calls is not a fit for the needs of OpenXT, this proposed > approach looks very good for the problem > that it aims to address. It is complimentary to the other mechanisms > available in the Xen ecosystem. > > I would like to convey my positive support for the PV Calls proposal. > > Christopher Clark > BAE Systems, OpenXT Project > http://openxt.org > > > On Thu, Aug 4, 2016 at 10:17 AM, Stefano Stabellini <stefano@xxxxxxxxxxx> > wrote: > Hi all, > > This is the design document of the PV Calls protocol. You can find > prototypes of the Linux frontend and backend drivers here: > > git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git > pvcalls-5 > > To use them, make sure to enable CONFIG_PVCALLS in your kernel config > and add "pvcalls=1" to the command line of your DomU Linux kernel. You > also need the toolstack to create the initial xenstore nodes for the > protocol. To do that, please apply the attached patch to libxl (the > patch is based on Xen 4.7.0-rc3) and add "pvcalls=1" to your DomU config > file. > > Note that previous versions of the protocols were named XenSock. It has > been renamed for clarity of scope and to avoid confusion with hv_sock > and vsock, which are used for inter-VMs communications. > > Cheers, > > Stefano > > Changes in v5: > - clarify text > - rename id to req_id > - rename sockid to id > - move id to request and response specific fields > - add version node to xenstore > > Changes in v4: > - rename xensock to pvcalls > > Changes in v3: > - add a dummy element to struct xen_xensock_request to make sure the > size of the struct is the same on both x86_32 and x86_64 > > Changes in v2: > - add max-dataring-page-order > - add "Publish backend features and transport parameters" to backend > xenbus workflow > - update new cmd values > - update xen_xensock_request > - add backlog parameter to listen and binary layout > - add description of new data ring format (interface+data) > - modify connect and accept to reflect new data ring format > - add link to POSIX docs > - add error numbers > - add address format section and relevant numeric definitions > - add explicit mention of unimplemented commands > - add protocol node name > - add xenbus shutdown diagram > - add socket operation > > --- > > # PV Calls Protocol version 1 > > ## Rationale > > PV Calls is a paravirtualized protocol that allows the implementation of > a set of POSIX functions in a different domain. The PV Calls frontend > sends POSIX function calls to the backend, which implements them and > returns a value to the frontend. > > This version of the document covers networking function calls, such as > connect, accept, bind, release, listen, poll, recvmsg and sendmsg; but > the protocol is meant to be easily extended to cover different sets of > calls. Unimplemented commands return ENOTSUPP. > > PV Calls provide the following benefits: > * full visibility of the guest behavior on the backend domain, allowing > for inexpensive filtering and manipulation of any guest calls > * excellent performance > > Specifically, PV Calls for networking offer these advantages: > * guest networking works out of the box with VPNs, wireless networks and > any other complex configurations on the host > * guest services listen on ports bound directly to the backend domain IP > addresses > * localhost becomes a secure namespace for inter-VMs communications > > > ## Design > > ### Xenstore > > The frontend and the backend connect to each other exchanging > information via > xenstore. The toolstack creates front and back nodes with state > XenbusStateInitialising. The protocol node name is **pvcalls**. There > can only > be one PV Calls frontend per domain. > > #### Frontend XenBus Nodes > > port > Values: <uint32_t> > > The identifier of the Xen event channel used to signal activity > in the ring buffer. > > ring-ref > Values: <uint32_t> > > The Xen grant reference granting permission for the backend to map > the sole page in a single page sized ring buffer. > > #### Backend XenBus Nodes > > version > Values: <uint32_t> > > Protocol version supported by the backend. > > max-dataring-page-order > Values: <uint32_t> > > The maximum supported size of the data ring in units of lb(machine > pages). (e.g. 0 == 1 page, 1 = 2 pages, 2 == 4 pages, etc.). > > #### State Machine > > Initialization: > > *Front* *Back* > XenbusStateInitialising XenbusStateInitialising > - Query virtual device - Query backend device > properties. identification data. > - Setup OS device instance. - Publish backend features > - Allocate and initialize the and transport parameters > request ring. | > - Publish transport parameters | > that will be in effect during V > this connection. XenbusStateInitWait > | > | > V > XenbusStateInitialised > > - Query frontend transport > parameters. > - Connect to the request ring > and > event channel. > | > | > V > XenbusStateConnected > > - Query backend device properties. > - Finalize OS virtual device > instance. > | > | > V > XenbusStateConnected > > Once frontend and backend are connected, they have a shared page, which > will is used to exchange messages over a ring, and an event channel, > which is used to send notifications. > > Shutdown: > > *Front* *Back* > XenbusStateConnected XenbusStateConnected > | > | > V > XenbusStateClosing > > - Unmap grants > - Unbind evtchns > | > | > V > XenbusStateClosing > > - Unbind evtchns > - Free rings > - Free data structures > | > | > V > XenbusStateClosed > > - Free remaining data structures > | > | > V > XenbusStateClosed > > > ### Commands Ring > > The shared ring is used by the frontend to forward POSIX function calls > to the > backend. I'll refer to this ring as **commands ring** to distinguish it > from > other rings which can be created later in the lifecycle of the protocol > (data > rings). The ring format is defined using the familiar > `DEFINE_RING_TYPES` macro > (`xen/include/public/io/ring.h`). Frontend requests are allocated on > the ring > using the `RING_GET_REQUEST` macro. > > The format is defined as follows: > > #define PVCALLS_SOCKET 0 > #define PVCALLS_CONNECT 1 > #define PVCALLS_RELEASE 2 > #define PVCALLS_BIND 3 > #define PVCALLS_LISTEN 4 > #define PVCALLS_ACCEPT 5 > #define PVCALLS_POLL 6 > > struct xen_pvcalls_request { > uint32_t req_id; /* private to guest, echoed in response */ > uint32_t cmd; /* command to execute */ > union { > struct xen_pvcalls_socket { > uint64_t id; > uint32_t domain; > uint32_t type; > uint32_t protocol; > } socket; > struct xen_pvcalls_connect { > uint64_t id; > uint8_t addr[28]; > uint32_t len; > uint32_t flags; > grant_ref_t ref; > uint32_t evtchn; > } connect; > struct xen_pvcalls_release { > uint64_t id; > } release; > struct xen_pvcalls_bind { > uint64_t id; > uint8_t addr[28]; > uint32_t len; > } bind; > struct xen_pvcalls_listen { > uint64_t id; > uint32_t backlog; > } listen; > struct xen_pvcalls_accept { > uint64_t id; > uint64_t id_new; > grant_ref_t ref; > uint32_t evtchn; > } accept; > struct xen_pvcalls_poll { > uint64_t id; > } poll; > /* dummy member to force sizeof(struct > xen_pvcalls_request) to match across archs */ > struct xen_pvcalls_dummy { > uint8_t dummy[56]; > } dummy; > } u; > }; > > The first two fields are common for every command. Their binary layout > is: > > 0 4 8 > +-------+-------+ > |req_id | cmd | > +-------+-------+ > > - **req_id** is generated by the frontend and identifies one specific > request > - **cmd** is the command requested by the frontend: > > - `PVCALLS_SOCKET`: 0 > - `PVCALLS_CONNECT`: 1 > - `PVCALLS_RELEASE`: 2 > - `PVCALLS_BIND`: 3 > - `PVCALLS_LISTEN`: 4 > - `PVCALLS_ACCEPT`: 5 > - `PVCALLS_POLL`: 6 > > Both fields are echoed back by the backend. > > As for the other Xen ring based protocols, after writing a request to > the ring, > the frontend calls `RING_PUSH_REQUESTS_AND_CHECK_NOTIFY` and issues an > event > channel notification when a notification is required. > > Backend responses are allocated on the ring using the > `RING_GET_RESPONSE` macro. > The format is the following: > > struct xen_pvcalls_response { > uint32_t req_id; > uint32_t cmd; > int32_t ret; > uint32_t pad; > union { > struct _xen_pvcalls_socket { > uint64_t id; > } socket; > struct _xen_pvcalls_connect { > uint64_t id; > } connect; > struct _xen_pvcalls_release { > uint64_t id; > } release; > struct _xen_pvcalls_bind { > uint64_t id; > } bind; > struct _xen_pvcalls_listen { > uint64_t id; > } listen; > struct _xen_pvcalls_accept { > uint64_t id; > } accept; > struct _xen_pvcalls_poll { > uint64_t id; > } poll; > struct _xen_pvcalls_dummy { > uint8_t dummy[8]; > } dummy; > } u; > }; > > The first four fields are common for every response. Their binary layout > is: > > 0 4 8 12 16 > +-------+-------+-------+-------+ > |req_id | cmd | ret | pad | > +-------+-------+-------+-------+ > > - **req_id**: echoed back from request > - **cmd**: echoed back from request > - **ret**: return value, identifies success (0) or failure (see error > numbers > below). If the **cmd** is not supported by the backend, ret is > ENOTSUPP. > - **pad**: padding > > After calling `RING_PUSH_RESPONSES_AND_CHECK_NOTIFY`, the backend > checks whether > it needs to notify the frontend and does so via event channel. > > A description of each command, their additional request and response > fields follow. > > > #### Socket > > The **socket** operation corresponds to the POSIX [socket][socket] > function. It > creates a new socket of the specified family, type and protocol. **id** > is > freely chosen by the frontend and references this specific socket from > this > point forward. See "Socket families and address format" below. > > Request fields: > > - **cmd** value: 0 > - additional fields: > - **id**: generated by the frontend, it identifies the new socket > - **domain**: the communication domain > - **type**: the socket type > - **protocol**: the particular protocol to be used with the socket, > usually 0 > > Request binary layout: > > 8 12 16 20 24 28 > +-------+-------+-------+-------+-------+ > | id |domain | type |protoco| > +-------+-------+-------+-------+-------+ > > Response additional fields: > > - **id**: echoed back from request > > Response binary layout: > > 16 20 24 > +-------+--------+ > | id | > +-------+--------+ > > Return value: > > - 0 on success > - See the [POSIX socket function][connect] for error names; the > corresponding > error numbers are specified later in this document. > > #### Connect > > The **connect** operation corresponds to the POSIX [connect][connect] > function. > It connects a previously created socket (identified by **id**) to the > specified address. > > The connect operation creates a new shared ring, which we'll call **data > ring**. The data ring is used to send and receive data from the socket. > The connect operation passes two additional parameters which are > utilized to setup the new ring: **evtchn** and **ref**. **evtchn** is > the > port number of a new event channel which will be used for notifications > of activity on the data ring. **ref** is the grant reference of a page > which containes shared pointers to write and read data from the data > ring > and the full array of grant references for the ring buffers. It will be > described in more detailed later. The data ring is unmapped and freed > upon > issuing a **release** command on the active socket identified by **id**. > > When the frontend issues a **connect** command, the backend: > - finds its own internal socket corresponding to **id** > - connects the socket to **addr** > - maps the grant reference **ref**, the shared page contains the data > ring interface (`struct pvcalls_data_intf`) > - maps all the grant references listed in `struct pvcalls_data_intf` and > uses them as shared memory for the ring buffers > - bind the **evtchn** > - replies to the frontend > > The data ring format will be described in the following section. > > Request fields: > > - **cmd** value: 0 > - additional fields: > - **id**: identifies the socket > - **addr**: address to connect to, see the address format section for > more > information > - **len**: address length > - **flags**: flags for the connection, reserved for future usage > - **ref**: grant reference of the page containing `struct > pvcalls_data_intf` > - **evtchn**: port number of the evtchn to signal activity on the > data ring > > Request binary layout: > > 8 12 16 20 24 28 32 36 40 > 44 > > +-------+-------+-------+-------+-------+-------+-------+-------+-------+ > | id | addr > | > > +-------+-------+-------+-------+-------+-------+-------+-------+-------+ > | len | flags | ref |evtchn | > +-------+-------+-------+-------+ > > Response additional fields: > > - **id**: echoed back from request > > Response binary layout: > > 16 20 24 > +-------+-------+ > | id | > +-------+-------+ > > Return value: > > - 0 on success > - See the [POSIX connect function][connect] for error names; the > corresponding > error numbers are specified later in this document. > > #### Release > > The **release** operation closes an existing active or a passive socket. > > When a release command is issued on a passive socket, the backend > releases it > and frees its internal mappings. When a release command is issued for > an active > socket, the data ring is also unmapped and freed: > > - frontend sends release command for an active socket > - backend releases the socket > - backend unmaps the data ring buffers > - backend unmaps the data ring interface > - backend unbinds the evtchn > - backend replies to frontend > - frontend frees ring and unbinds evtchn > > Request fields: > > - **cmd** value: 1 > - additional fields: > - **id**: identifies the socket > > Request binary layout: > > 8 12 16 > +-------+-------+ > | id | > +-------+-------+ > > Response additional fields: > > - **id**: echoed back from request > > Response binary layout: > > 16 20 24 > +-------+-------+ > | id | > +-------+-------+ > > Return value: > > - 0 on success > - See the [POSIX shutdown function][shutdown] for error names; the > corresponding error numbers are specified later in this document. > > #### Bind > > The **bind** operation corresponds to the POSIX [bind][bind] function. > It > assigns the address passed as parameter to a previously created socket, > identified by **id**. **Bind**, **listen** and **accept** are the three > operations required to have fully working passive sockets and should be > issued in this order. > > Request fields: > > - **cmd** value: 2 > - additional fields: > - **id**: identifies the socket > - **addr**: address to connect to, see the address format section for > more > information > - **len**: address length > > Request binary layout: > > 8 12 16 20 24 28 32 36 40 > 44 > > +-------+-------+-------+-------+-------+-------+-------+-------+-------+ > | id | addr > | > > +-------+-------+-------+-------+-------+-------+-------+-------+-------+ > | len | > +-------+ > > Response additional fields: > > - **id**: echoed back from request > > Response binary layout: > > 16 20 24 > +-------+-------+ > | id | > +-------+-------+ > > Return value: > > - 0 on success > - See the [POSIX bind function][bind] for error names; the > corresponding error > numbers are specified later in this document. > > > #### Listen > > The **listen** operation marks the socket as a passive socket. It > corresponds to > the [POSIX listen function][listen]. > > Reuqest fields: > > - **cmd** value: 3 > - additional fields: > - **id**: identifies the socket > - **backlog**: the maximum length to which the queue of pending > connections may grow > > Request binary layout: > > 8 12 16 20 > +-------+-------+-------+ > | id |backlog| > +-------+-------+-------+ > > Response additional fields: > > - **id**: echoed back from request > > Response binary layout: > > 16 20 24 > +-------+-------+ > | id | > +-------+-------+ > > Return value: > - 0 on success > - See the [POSIX listen function][listen] for error names; the > corresponding > error numbers are specified later in this document. > > > #### Accept > > The **accept** operation extracts the first connection request on the > queue of pending connections for the listening socket identified by > **id** and creates a new connected socket. The id of the new socket is > also chosen by the frontend and passed as an additional field of the > accept request struct (**id_new**). See the [POSIX accept > function][accept] > as reference. > > Similarly to the **connect** operation, **accept** creates a new data > ring. > Information necessary to setup the new ring, such the grant table > reference of > the page containing the data ring interface (`struct > pvcalls_data_intf`) and > event channel port, are passed from the frontend to the backend as part > of the > request. > > The backend will reply to the request only when a new connection is > successfully > accepted, i.e. the backend does not return EAGAIN or EWOULDBLOCK. > > Example workflow: > > - frontend issues an **accept** request > - backend waits for a connection to be available on the socket > - a new connection becomes available > - backend accepts the new connection > - backend creates an internal mapping from **id_new** to the new socket > - backend maps the grant reference **ref**, the shared page contains the > data ring interface (`struct pvcalls_data_intf`) > - backend maps all the grant references listed in `struct > pvcalls_data_intf` and uses them as shared memory for the new data > ring > - backend binds the **evtchn** > - backend replies to the frontend > > Request fields: > > - **cmd** value: 4 > - additional fields: > - **id**: id of listening socket > - **id_new**: id of the new socket > - **ref**: grant reference of the data ring interface (`struct > pvcalls_data_intf`) > - **evtchn**: port number of the evtchn to signal activity on the > data ring > > Request binary layout: > > 8 12 16 20 24 28 32 > +-------+-------+-------+-------+-------+-------+ > | id | id_new | ref |evtchn | > +-------+-------+-------+-------+-------+-------+ > > Response additional fields: > > - **id**: id of the listening socket, echoed back from request > > Response binary layout: > > 16 20 24 > +-------+-------+ > | id | > +-------+-------+ > > Return value: > > - 0 on success > - See the [POSIX accept function][accept] for error names; the > corresponding > error numbers are specified later in this document. > > > #### Poll > > In this version of the protocol, the **poll** operation is only valid > for passive sockets. For active sockets, the frontend should look at the > state of the data ring. When a new connection is available in the queue > of the passive socket, the backend generates a response and notifies the > frontend. > > Request fields: > > - **cmd** value: 5 > - additional fields: > - **id**: identifies the listening socket > > Request binary layout: > > 8 12 16 > +-------+-------+ > | id | > +-------+-------+ > > > Response additional fields: > > - **id**: echoed back from request > > Response binary layout: > > 16 20 24 > +--------+--------+ > | id | > +--------+--------+ > > Return value: > > - 0 on success > - See the [POSIX poll function][poll] for error names; the > corresponding error > numbers are specified later in this document. > > #### Error numbers > > The numbers corresponding to the error names specified by POSIX are: > > [EPERM] -1 > [ENOENT] -2 > [ESRCH] -3 > [EINTR] -4 > [EIO] -5 > [ENXIO] -6 > [E2BIG] -7 > [ENOEXEC] -8 > [EBADF] -9 > [ECHILD] -10 > [EAGAIN] -11 > [EWOULDBLOCK] -11 > [ENOMEM] -12 > [EACCES] -13 > [EFAULT] -14 > [EBUSY] -16 > [EEXIST] -17 > [EXDEV] -18 > [ENODEV] -19 > [EISDIR] -21 > [EINVAL] -22 > [ENFILE] -23 > [EMFILE] -24 > [ENOSPC] -28 > [EROFS] -30 > [EMLINK] -31 > [EDOM] -33 > [ERANGE] -34 > [EDEADLK] -35 > [EDEADLOCK] -35 > [ENAMETOOLONG] -36 > [ENOLCK] -37 > [ENOTEMPTY] -39 > [ENOSYS] -38 > [ENODATA] -61 > [ETIME] -62 > [EBADMSG] -74 > [EOVERFLOW] -75 > [EILSEQ] -84 > [ERESTART] -85 > [ENOTSOCK] -88 > [EOPNOTSUPP] -95 > [EAFNOSUPPORT] -97 > [EADDRINUSE] -98 > [EADDRNOTAVAIL] -99 > [ENOBUFS] -105 > [EISCONN] -106 > [ENOTCONN] -107 > [ETIMEDOUT] -110 > [ENOTSUPP] -524 > > #### Socket families and address format > > The following definitions and explicit sizes, together with POSIX > [sys/socket.h][address] and [netinet/in.h][in] define socket families > and > address format. Please be aware that only the **domain** `AF_INET`, > **type** > `SOCK_STREAM` and **protocol** `0` are supported by this version of the > spec. > > #define AF_UNSPEC 0 > #define AF_UNIX 1 /* Unix domain sockets */ > #define AF_LOCAL 1 /* POSIX name for AF_UNIX */ > #define AF_INET 2 /* Internet IP Protocol */ > #define AF_INET6 10 /* IP version 6 */ > > #define SOCK_STREAM 1 > #define SOCK_DGRAM 2 > #define SOCK_RAW 3 > > /* generic address format */ > struct sockaddr { > uint16_t sa_family_t; > char sa_data[26]; > }; > > struct in_addr { > uint32_t s_addr; > }; > > /* AF_INET address format */ > struct sockaddr_in { > uint16_t sa_family_t; > uint16_t sin_port; > struct in_addr sin_addr; > char sin_zero[20]; > }; > > > ### Data ring > > Data rings are used for sending and receiving data over a connected > socket. They > are created upon a successful **accept** or **connect** command. > > A data ring is composed of two pieces: the interface and the **in** and > **out** > buffers. The interface, represented by `struct pvcalls_ring_intf` is > shared > first and resides on the page whose grant reference is passed by > **accept** and > **connect** as parameter. `struct pvcalls_ring_intf` contains the list > of grant > references which constitute the **in** and **out** data buffers. > > #### Data ring interface > > struct pvcalls_data_intf { > PVCALLS_RING_IDX in_cons, in_prod; > PVCALLS_RING_IDX out_cons, out_prod; > int32_t in_error, out_error; > > uint32_t ring_order; > grant_ref_t ref[]; > }; > > /* not actually C compliant (ring_order changes from socket to > socket) */ > struct pvcalls_data { > char in[((1<<ring_order)<<PAGE_SHIFT)/2]; > char out[((1<<ring_order)<<PAGE_SHIFT)/2]; > }; > > - **ring_order** > It represents the order of the data ring. The following list of grant > references is of `(1 << ring_order)` elements. It cannot be greater > than > **max-dataring-page-order**, as specified by the backend on XenBus. > - **ref[]** > The list of grant references which will contain the actual data. They > are > mapped contiguosly in virtual memory. The first half of the pages is > the > **in** array, the second half is the **out** array. > - **in** is an array used as circular buffer > It contains data read from the socket. The producer is the backend, > the > consumer is the frontend. > - **out** is an array used as circular buffer > It contains data to be written to the socket. The producer is the > frontend, > the consumer is the backend. > - **in_cons** and **in_prod** > Consumer and producer pointers for data read from the socket. They > keep track > of how much data has already been consumed by the frontend from the > **in** > array. **in_prod** is increased by the backend, after writing data to > **in**. > **in_cons** is increased by the frontend, after reading data from > **in**. > - **out_cons**, **out_prod** > Consumer and producer pointers for the data to be written to the > socket. They > keep track of how much data has been written by the frontend to > **out** and > how much data has already been consumed by the backend. **out_prod** > is > increased by the frontend, after writing data to **out**. > **out_cons** is > increased by the backend, after reading data from **out**. > - **in_error** and **out_error** They signal errors when reading from > the socket > (**in_error**) or when writing to the socket (**out_error**). 0 means > no > errors. When an error occurs, no further reads or writes operations > are > performed on the socket. In the case of an orderly socket shutdown > (i.e. read > returns 0) **in_error** is set to ENOTCONN. **in_error** and > **out_error** > are never set to EAGAIN or EWOULDBLOCK. > > The binary layout of `struct pvcalls_data_intf` follows: > > 0 4 8 12 16 20 24 > 28 > > +---------+---------+---------+---------+---------+---------+----------+ > | in_cons | in_prod |out_cons |out_prod |in_error > |out_error|ring_order| > > +---------+---------+---------+---------+---------+---------+----------+ > > 28 32 36 40 4092 4096 > +---------+---------+---------+----//---+---------+ > | ref[0] | ref[1] | ref[2] | | ref[N] | > +---------+---------+---------+----//---+---------+ > > The binary layout of the ring buffers follow: > > 0 ((1<<ring_order)<<PAGE_SHIFT)/2 > ((1<<ring_order)<<PAGE_SHIFT) > +------------//-------------+------------//-------------+ > | in | out | > +------------//-------------+------------//-------------+ > > #### Workflow > > The **in** and **out** arrays are used as circular buffers: > > 0 sizeof(array) == > ((1<<ring_order)<<PAGE_SHIFT)/2 > +-----------------------------------+ > |to consume| free |to consume | > +-----------------------------------+ > ^ ^ > prod cons > > 0 sizeof(array) > +-----------------------------------+ > | free | to consume | free | > +-----------------------------------+ > ^ ^ > cons prod > > The following function is provided to calculate how many bytes are > currently > left unconsumed in an array: > > #define _MASK_PVCALLS_IDX(idx, ring_size) ((idx) & (ring_size-1)) > > static inline PVCALLS_RING_IDX pvcalls_ring_queued(PVCALLS_RING_IDX > prod, > PVCALLS_RING_IDX cons, > PVCALLS_RING_IDX ring_size) > { > PVCALLS_RING_IDX size; > > if (prod == cons) > return 0; > > prod = _MASK_PVCALLS_IDX(prod, ring_size); > cons = _MASK_PVCALLS_IDX(cons, ring_size); > > if (prod == cons) > return ring_size; > > if (prod > cons) > size = prod - cons; > else { > size = ring_size - cons; > size += prod; > } > return size; > } > > The producer (the backend for **in**, the frontend for **out**) writes > to the > array in the following way: > > - read *cons*, *prod*, *error* from shared memory > - memory barrier > - return on *error* > - write to array at position *prod* up to *cons*, wrapping around the > circular > buffer when necessary > - memory barrier > - increase *prod* > - notify the other end via evtchn > > The consumer (the backend for **out**, the frontend for **in**) reads > from the > array in the following way: > > - read *prod*, *cons*, *error* from shared memory > - memory barrier > - return on *error* > - read from array at position *cons* up to *prod*, wrapping around the > circular > buffer when necessary > - memory barrier > - increase *cons* > - notify the other end via evtchn > > The producer takes care of writing only as many bytes as available in > the buffer > up to *cons*. The consumer takes care of reading only as many bytes as > available > in the buffer up to *prod*. *error* is set by the backend when an error > occurs > writing or reading from the socket. > > > [address]: > http://pubs.opengroup.org/onlinepubs/7908799/xns/syssocket.h.html > [in]: > http://pubs.opengroup.org/onlinepubs/000095399/basedefs/netinet/in.h.html > [socket]: > http://pubs.opengroup.org/onlinepubs/009695399/functions/socket.html > [connect]: http://pubs.opengroup.org/onlinepubs/7908799/xns/connect.html > [shutdown]: > http://pubs.opengroup.org/onlinepubs/7908799/xns/shutdown.html > [bind]: http://pubs.opengroup.org/onlinepubs/7908799/xns/bind.html > [listen]: http://pubs.opengroup.org/onlinepubs/7908799/xns/listen.html > [accept]: http://pubs.opengroup.org/onlinepubs/7908799/xns/accept.html > [poll]: http://pubs.opengroup.org/onlinepubs/7908799/xsh/poll.html > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > https://lists.xen.org/xen-devel > > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |