This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Re-design the architecture of Xen

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Re-design the architecture of Xen
From: henanwxr <henanwxr@xxxxxxx>
Date: Mon, 23 May 2011 04:39:37 -0700 (PDT)
Delivery-date: Mon, 23 May 2011 04:40:33 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
http://xen.1045712.n5.nabble.com/file/n4418793/6.bmp We have researched
virtualization for several years, with the reference of Xen, we have design
a new VMM architecture called Cooperative model VMM,and have implemented a
prototype system.
We present its principle and part of details here.

Part1 motivation

B. Domain0 problems
Domain0 has several features: 
       Running modified operating system. 
       Running on processor with privilege level 1 
       Running in a form of virtual machine
       Single system managing hardware
These features of Domain0 bring the following issues:
1) tight coupling
>From a performance point of view, the coordination of Domain0 and VMM (such
as: hypercall), event channel and IO ring can improve virtualization
efficiency, which, however, requires more modification of guest operating
system. Also, VMM needs to provide the corresponding interface. The tight
coupling formed between Domain0 and VMM results that VMM implementations
must take third-party system characteristics into account, design is lack of
independence and flexibility. 
2) privilege level switch
Domain0 is running on the processor with privilege level 1, context switch
from the VMM to Domain0 will trigger processor privilege level switches. If
operation of this type is more frequent (such as IO request operation for a
virtual machine), it will result in larger processor overhead, impacting the
performance of virtual machine.
3) overhead of management
Operating as a virtual machine, Domain0 needs VMM to provide appropriate
virtual machine managing interface, such as: creation, resource allocation,
scheduling, and destruction, etc., the resulting administrative overhead.
Domain0, as the main provider of device access, its function is relatively
fixed and administrative overhead should be avoided to reduce the burden on
4) scheduling Delay 
Domain0 and other virtual machines take part in VMM scheduling, due to
scheduling rotation characteristics, Domain0 can not guarantee timely
delivery of services, which results a number of related issues. First, after
VMM receive IO request from virtual machine, Domain0 could not be
immediately notice, only asynchronous notice way which similar to soft
interrupt can be used, and Domian0 will test and process it when running.
Second, device model of Domain0 is provided by Qemu, which is running as a
process of guest OS. When Domain0 is not running, Qemu can not handle IO
requests from virtual machine, resulting in delay of processing IO requests.
Third, other virtual machine scheduling depends on virtual clock interrupts,
Domian0 simulation of virtual clock will lead to problems of virtual clock
synchronization, virtual machine scheduling, and clock synchronization
between the virtual multi-core (currently the realization of virtual clock
has migrated from Domain0 to VMM).
5) IOPM bottleneck
In multiple virtual machines running case, the resulting IO request will be
quite frequently, because Domain0 is the only IOPM (IO process machine) of
entire system, and all IO requests will be handled through Domain0, forming
the IOPM bottleneck. For further considerations, if one IOPM fails, and if
it cannot be replaced timely by alternative IOPM, entire system can only be
restarted, resulting in delays or even collapse of services of virtual
   Main cause of Domain0 related problems mentioned above are that IOPM is
virtualized, acting as a subsidiary module of VMM. Because the nature role
of Domain0 is providing services of accessing equipment to VMM, a possible
solution is: under the premise that Domain0 provides services to VMM, to
achieve IOPM thoroughly separated from VMM. From four aspects: 
Weakening of VMM and Domain0 coupling to increase the independence of VMM
       Reducing VMM interference to Domain0 to give Domain0 the right to 
       Establishing interact between VMM and Domain0 to ensure that Domain0
provide device access services to VMM. 
       Providing multiple IOPM to achieve load balance. 
In accordance with the above considerations, operating system does not need
to be modified too much to implement IOPM, IOPM interacts with VMM with only
a small number of interfaces. From the way of controlling hardware resources
directly, IOPM converts from subsidiary module of VMM into cooperation
module of VMM. The cooperation model of VMM discussed below achieves and
verifies the above-mentioned IOPM. 

Part2 Cooperative model VMM 

A. Cooperative model description
With the popularity of multi-core processors and of large-capacity memory,
hardware resources of PC machine are no longer scarce. In the 60's of last
century, IBM S/360 mainframe used hardware partition approach to implement
virtualization, providing a useful inspiration for the current PC platform
   For the problems of IOPM virtualization and coupling tightly with VMM in
Hybrid model, method of hardware division can be used to make IOPM control a
part of hardware resources directly, converting from virtual machine to
privileged machine, forming structure of IOPM and VMM cooperative. Main
control system consists of two parts: VMM which implement processor and
memory virtualizations, and IOPM which controls peripherals and provides
device model. More than one IOPM can exist, and each IOPM control an AP,
while VMM controls BSP and the rest of APs, as shown in Fig 5. Cooperative
model has the following characteristics:
       Elimination of tight coupling between VMM and IOPM, which interact 
only a handful of interfaces.
       Independence of IOPM from VMM monitoring and scheduling. 
       Multiple IOPM parallel for load balance and failure replacement 
                                   Figure 5. Structure of cooperative VMM
B. Interrupt handling 
1) IOPM controls right of interrupt reception
Assume that device interrupt is submitted directly to IOPM, it looks like
that device access path of 
IOPM is shortened, as shown in Fig 6. 
                                  Figure 6. IOPM controls right of interrupt
   In this way, IOPM has the rights of external interrupt reception and
processing at the same time, but consider the following three situations: 
       IOPM contains a large number of device drivers, whose stability will
affect the security of IOPM and whole system. Suppose that IOPM fails due to
device driver failure, consequences result is that corresponding device
interrupted can not be responded so that virtual machine IO requests can not
be processed. 
       In some cases, a small amount special device drivers are need to be
integrated into VMM, then IO requests can be handled within VMM without
delivering to IOPM, thereby enhancing efficiency of devices access, such as
certain interrupt high frequency devices (clock, net card, etc.). 
       To enhance the stability of whole system, hoping driver can be 
across multiple IOPM, to prevent collapse of entire system caused by a
single IOPM failure. In this case, VMM needs to control right of interrupt
reception, and submit the interruption to other IOPM. 
Above analysis shows that, right of interrupt reception controlled by IOPM
has a big problem, interrupt reception and interrupt handling need to be
separate: VMM receive interrupts, while IOPM handling interrupts,
controlling of right of interrupt reception by VMM can achieve equipment
control at minimal expense. 

2) VMM controls right of interrupt reception
To solve these problems of IOPM control right of interrupt reception,
interrupt handling can be improved as follows: External interrupt submitted
to VMM firstly, VMM providing interrupt routing function, routed interrupt
to appropriate IOPMs. External interrupt first submitted to the VMM,
depending on actual circumstances, VMM can handle directly, or submit to an
IOPM, as shown in Fig 7. 
                                  Figure 7. VMM controls right of interrupt
 The improved VMM has the following characteristics in device processing:
       Interruption is received and routed by VMM to improve flexibility of
interrupt handling. 
       VMM integrates directly some of the key device drivers to shorten device
access path. 
       Device drivers are distributed in multiple IOPM to achieve load balance
and failure replacement. 

Part3 Model implementation

Implementation of cooperative VMM require division of hardware resources
which can eliminating control conflict of hardware between VMM and IOPM. On
this basis, appropriate operating system will be selected and transformation
to IOPM. Currently, the realization of this model is based on the
dual-processor platform with Intel VT-x, and the IOPM is based on Linux.
A. Hardware division
Hardware division among IOPM and VMM as shown in Table 1.
                            TABLE 1.   HARDWARE DIVISION BETWEEN IOPM AND
1) Processor
IOPM controls a single processor, can not be used for
multi-processor-related operations. BSP need to be run first after starting
of machine and controlled by VMM, VMM then can start AP and running IOPM at
an appropriate time in order to make the VMM and IOPM running paralleled.
2) Memory
Physical memory is controlled with subarea by VMM and IOPM, but data can
interact through shared memory.
External interruption must first submit to BSP in which VMM is located, the
decision of handling interruption will be made by VMM.
4) Clock
Both VMM and IOPM require scheduling of its internal program. Since
scheduling and clock interrupts are related, clock interrupt will need to be
submitted to the VMM and IOPM at the same time.
5) IO Device
IO device is controlled by IOPM, IO request of the Virtual Machine will be
submitted to IOPM through VMM, accessing of device is achieved with help of
its device driver.
B. IOPM Implementation
Implementation of IOPM involves four aspects:
1)      Boot IOPM
In traditional, Linux is load by boot loader, for example grub, Linux kernel
code is divided into two parts, real mode and protected mode. According to
Linux boot protocol, real mode code is required to be copied to a space
which below 1M by bootloader and bootloader parse kernel header information
in order to cope protected mode code to specified location. Boot loader then
jump to location of real mode code and operating system will take control of
   Boot IOPM by VMM also needs to simulate this flow, Linux real mode code
will be copied to a free space which below 1M. In traditional, protected
mode code is located in 1M, which has been occupied by VMM. Therefore,
protected mode code is copied to another security zones. VMM boot AP
processors after completion of layout of IOPM code, it needs to switch to
real mode before the execution of IOPM by AP, and then jump to the starting
address of the real mode code. The flow is shown in Fig 8.
                                               Figure 8. Flow of booting
2)      Physical memory isolation
In order to achieve spacial address isolation and data exchange between VMM
and IOPM, entire physical memory is divided into three parts: VMM management
zone, IOPM Management zone, and shared zone. Management zones involved in
the dynamic allocation and recovery of memory manager, sharing zone can only
be accessed but not participate in allocation, division of physical memory
and its property as shown in Fig 9. 
                             Figure 9. division of physical memory and its
3) Communications between VMM and IOPM
VMM and IOPM generally communicate under two conditions: First of all, IO
requests issued by virtual machine captured by VMM and submit to IOPM, IOPM
then return the processing results to VMM. Secondly, user issues a request
to VMM through user interface which provided by IOPM to complete the virtual
machine operation. Communication mechanism built on IPIs and shared memory,
IPIs is used for message notification between IOPM and VMM, shared memory is
used for temporary storage of interactive data. 
3)Shared memory
Shared memory is used for temporary storage of interactive data between VMM
and IOPM. In order to prevent buffer overflow, organizations of shared
memory is required. The shared memory is divided into four parts:
VMM-controlled area, IOPM-controlled area, VMM data area, IOPM data area.
The public control pointer which store in controlled area is used to operate
data package in data area. Data area is organized in form of ring: VMM data
area is used for temporary storage of data package from VMM to IOPM, IOPM
data area is used for temporary storage data package from IOPM to VMM. 

View this message in context: 
Sent from the Xen - Dev mailing list archive at Nabble.com.

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>