[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC][PATCH 01/13] tools: introduce some new parameters to set rdm policy



On Mon, May 11, 2015 at 01:35:06PM +0800, Chen, Tiejun wrote:
> On 2015/5/8 21:04, Wei Liu wrote:
> >Sorry for the late review.
> >
> 
> Really thanks for taking your time :)
> 
> >On Fri, Apr 10, 2015 at 05:21:52PM +0800, Tiejun Chen wrote:
> >>This patch introduces user configurable parameters to specify RDM
> >>resource and according policies,
> >>
> >>Global RDM parameter:
> >>     rdm = [ 'host, reserve=force/try' ]
> >>Per-device RDM parameter:
> >>     pci = [ 'sbdf, rdm_reserve=force/try' ]
> >>
> >>Global RDM parameter allows user to specify reserved regions explicitly,
> >>e.g. using 'host' to include all reserved regions reported on this platform
> >>which is good to handle hotplug scenario. In the future this parameter
> >>may be further extended to allow specifying random regions, e.g. even
> >>those belonging to another platform as a preparation for live migration
> >>with passthrough devices.
> >>
> >>'force/try' policy decides how to handle conflict when reserving RDM
> >>regions in pfn space. If conflict exists, 'force' means an immediate error
> >>so VM will be killed, while 'try' allows moving forward with a warning
> >>message thrown out.
> >>
> >>Default per-device RDM policy is 'force', while default global RDM policy
> >>is 'try'. When both policies are specified on a given region, 'force' is
> >>always preferred.
> >>
> >>Signed-off-by: Tiejun Chen <tiejun.chen@xxxxxxxxx>
> >>---
> >>  docs/man/xl.cfg.pod.5       | 44 +++++++++++++++++++++++++
> >>  docs/misc/vtd.txt           | 34 ++++++++++++++++++++
> >>  tools/libxl/libxl_create.c  |  5 +++
> >>  tools/libxl/libxl_types.idl | 18 +++++++++++
> >>  tools/libxl/libxlu_pci.c    | 78 
> >> +++++++++++++++++++++++++++++++++++++++++++++
> >>  tools/libxl/libxlutil.h     |  4 +++
> >>  tools/libxl/xl_cmdimpl.c    | 21 +++++++++++-
> >>  7 files changed, 203 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> >>index 408653f..9ed3055 100644
> >>--- a/docs/man/xl.cfg.pod.5
> >>+++ b/docs/man/xl.cfg.pod.5
> >>@@ -583,6 +583,36 @@ assigned slave device.
> >>
> >>  =back
> >>
> >>+=item B<rdm=[ "TYPE", "RDM_RESERVE_STRING", ... ]>
> >>+
> >
> >Shouldn't this be "TYPE,RDM_RESERVE_STRIGN" according to your commit
> >message? If the only available config is just one string, you probably
> >don't need a list for this?
> 
> Yes, based on that design we don't need a list. So
> 
> =item B<rdm=[ "RDM_RESERVE_STRING" ]>
> 

Note that this is still a list (enclosed by "[]"). Maybe you mean

   rdm = "RDM_RESERVE_STRING"

?

> >
> >>+(HVM/x86 only) Specifies the information about Reserved Device Memory 
> >>(RDM),
> >>+which is necessary to enable robust device passthrough usage. One example 
> >>of
> >>+RDM is reported through ACPI Reserved Memory Region Reporting (RMRR)
> >>+structure on x86 platform.
> >>+Each B<RDM_CHECK_STRING> has the form C<["TYPE",KEY=VALUE,KEY=VALUE,...> 
> >>where:
> >>+
> >
> >RDM_CHECK_STRING?
> 
> And here should be corrected like this,
> 
> B<RDM_RESERVE_STRING> has the form ...
> 
> >
> >>+=over 4
> >>+
> >>+=item B<"TYPE">
> >>+
> >>+Currently we just have one type. 'host' means all reserved device memory on
> >>+this platform should be reserved in this VM's pfn space.
> >>+
> >
> >What are other possible types? If there is only one type then we can
> 
> Currently we just have one type and looks that design doesn't make this
> clear.
> 
> >simply ignore the type?
> 
> I just think we may introduce something else specific to live migration in
> the future... But I'm really not sure right now.
> 

Fair enough. I was just wondering if there would be any other types. If
so we do need provisioning.

In any case, the "type" argument you proposed is a positional argument
(you require it to be the first element of the spec string").
I think you can just make it a key-value pair to make parsing easier.

> >
> >>+=item B<KEY=VALUE>
> >>+
> >>+Possible B<KEY>s are:
> >>+
> >>+=over 4
> >>+
> >>+=item B<reserve="STRING">
> >>+
> >>+Conflict may be detected when reserving reserved device memory in gfn 
> >>space.
> >>+'force' means an unsolved conflict leads to immediate VM destroy, while
> >
> >Do you mean "immediate VM crash"?
> 
> Yes. So I guess I should replace this.
> 
> >
> >>+'try' allows VM moving forward with a warning message thrown out. 'try'
> >>+is default.
> >
> >Can you please your double quotes for "force", "try" etc.
> 
> Sure. Just note we'd like to use "strict"/"relaxed" to replace "force"/"try"
> from next revision according to Jan's suggestion.
> 

No problem.

> >
> >>+
> >>+Note this may be overrided by another sub item, rdm_reserve, in pci device.
> >>+
> >>  =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
> >>
> >>  Specifies the host PCI devices to passthrough to this guest. Each 
> >> B<PCI_SPEC_STRING>
> >>@@ -645,6 +675,20 @@ dom0 without confirmation.  Please use with care.
> >>  D0-D3hot power management states for the PCI device. False (0) by
> >>  default.
> >>
> >>+=item B<rdm_check="STRING">
> >>+
> >>+(HVM/x86 only) Specifies the information about Reserved Device Memory 
> >>(RDM),
> >>+which is necessary to enable robust device passthrough usage. One example 
> >>of
> >>+RDM is reported through ACPI Reserved Memory Region Reporting (RMRR)
> >>+structure on x86 platform.
> >>+
> >>+Conflict may be detected when reserving reserved device memory in gfn 
> >>space.
> >>+'force' means an unsolved conflict leads to immediate VM destroy, while
> >>+'try' allows VM moving forward with a warning message thrown out. 'force'
> >>+is default.
> >>+
> >>+Note this would override another global item, rdm = [''].
> >>+
> >
> >Note this would override global B<rdm> option.
> 
> Fixed.
> 
> >
> >>  =back
> >>
> >>  =back
> >>diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
> >>index 9af0e99..d7434d6 100644
> >>--- a/docs/misc/vtd.txt
> >>+++ b/docs/misc/vtd.txt
> >>@@ -111,6 +111,40 @@ in the config file:
> >>  To override for a specific device:
> >>    pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
> >>
> >>+RDM, 'reserved device memory', for PCI Device Passthrough
> >>+---------------------------------------------------------
> >>+
> >>+There are some devices the BIOS controls, for e.g. USB devices to perform
> >>+PS2 emulation. The regions of memory used for these devices are marked
> >>+reserved in the e820 map. When we turn on DMA translation, DMA to those
> >>+regions will fail. Hence BIOS uses RMRR to specify these regions along with
> >>+devices that need to access these regions. OS is expected to setup
> >>+identity mappings for these regions for these devices to access these 
> >>regions.
> >>+
> >>+While creating a VM we should reserve them in advance, and avoid any 
> >>conflicts.
> >>+So we introduce user configurable parameters to specify RDM resource and
> >>+according policies,
> >>+
> >>+To enable this globally, add "rdm" in the config file:
> >>+
> >>+    rdm = [ 'host, reserve=force/try' ]
> >>+
> >
> >The "force/try" should be called "policy". And then you explain what
> >policies we have.
> 
> Do you mean we should rename this?
> 
> rdm = [ 'host, policy=force/try' ]
> 

No, I didn't ask you to rename that.

The above line is an example which should reflect the correct syntax.
"force/try" is not the *actual syntax*, i.e. you won't write that in
your config file.

I meant to changes it to "reserve=POLICY". Then you explain the possible
values of POLICY.

> This is really a policy but 'reserve' may can reflect our action explicitly,
> right?
> 
> >
> >>+Or just for a specific device:
> >>+
> >>+   pci = [ '01:00.0,rdm_reserve=force/try', '03:00.0' ]
> 
> And you also can see this.
> 
> But anyway, if you're really stick to rename this, I'm going to be fine as
> well but we should ping every one to check this point since this name is
> from our previous discussion.
> 
> >>+
> >>+Global RDM parameter allows user to specify reserved regions explicitly.
> >>+Using 'host' to include all reserved regions reported on this platform
> >>+which is good to handle hotplug scenario. In the future this parameter
> >>+may be further extended to allow specifying random regions, e.g. even
> >>+those belonging to another platform as a preparation for live migration
> >>+with passthrough devices.
> >>+
> >>+'force/try' policy decides how to handle conflict when reserving RDM
> >>+regions in pfn space. If conflict exists, 'force' means an immediate error
> >>+so VM will be killed, while 'try' allows moving forward with a warning
> >
> >Be killed by whom? I think it's hvmloader crashes voluntarily, right?
> 
> s/VM will be kille/hvmloader crashes voluntarily
> 
> >
> >>+message thrown out.
> >>+
> >>
> >>  Caveat on Conventional PCI Device Passthrough
> >>  ---------------------------------------------
> >>diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> >>index 98687bd..9ed40d4 100644
> >>--- a/tools/libxl/libxl_create.c
> >>+++ b/tools/libxl/libxl_create.c
> >>@@ -1407,6 +1407,11 @@ static void domcreate_attach_pci(libxl__egc *egc, 
> >>libxl__multidev *multidev,
> >>      }
> >>
> >>      for (i = 0; i < d_config->num_pcidevs; i++) {
> >>+        /*
> >>+         * If the rdm global policy is 'force' we should override each 
> >>device.
> >>+         */
> >>+        if (d_config->b_info.rdm.reserve == LIBXL_RDM_RESERVE_FLAG_FORCE)
> >>+            d_config->pcidevs[i].rdm_reserve = 
> >>LIBXL_RDM_RESERVE_FLAG_FORCE;
> >>          ret = libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1);
> >>          if (ret < 0) {
> >>              LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
> >>diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> >>index 47af340..5786455 100644
> >>--- a/tools/libxl/libxl_types.idl
> >>+++ b/tools/libxl/libxl_types.idl
> >>@@ -71,6 +71,17 @@ libxl_domain_type = Enumeration("domain_type", [
> >>      (2, "PV"),
> >>      ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")
> >>
> >>+libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
> >>+    (0, "none"),
> >>+    (1, "host"),
> >>+    ])
> >>+
> >>+libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [
> >>+    (-1, "invalid"),
> >>+    (0, "force"),
> >>+    (1, "try"),
> >>+    ])
> >
> >If you don't set init_val, the default value would be "force" (0), is this
> 
> Yes.
> 
> >want you want?
> 
> We have a little bit of complexity here,
> 
> "Default per-device RDM policy is 'force', while default global RDM policy
> is 'try'. When both policies are specified on a given region, 'force' is
> always preferred."
> 

This is going to be done in actual code anyway.

This type is used both in global and per-device setting, so I envisage
this to have an invalid value to start with. Appropriate default values
should be done in libxl_TYPE_setdefault functions. And the logic to
detect conflict and preferences done in your construct function.

What do you think?

> >
> >>+
> >>  libxl_channel_connection = Enumeration("channel_connection", [
> >>      (0, "UNKNOWN"),
> >>      (1, "PTY"),
> >>@@ -356,6 +367,11 @@ libxl_domain_sched_params = 
> >>Struct("domain_sched_params",[
> >>      ("budget",       integer, {'init_val': 
> >> 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
> >>      ])
> >>
> >>+libxl_rdm_reserve = Struct("rdm_reserve", [
> >>+    ("type",    libxl_rdm_reserve_type),
> >>+    ("reserve",   libxl_rdm_reserve_flag),
> >>+    ])
> >>+
> >>  libxl_domain_build_info = Struct("domain_build_info",[
> >>      ("max_vcpus",       integer),
> >>      ("avail_vcpus",     libxl_bitmap),
> >>@@ -401,6 +417,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
> >>      ("kernel",           string),
> >>      ("cmdline",          string),
> >>      ("ramdisk",          string),
> >>+    ("rdm",     libxl_rdm_reserve),
> >>      ("u", KeyedUnion(None, libxl_domain_type, "type",
> >>                  [("hvm", Struct(None, [("firmware",         string),
> >>                                         ("bios",             
> >> libxl_bios_type),
> >>@@ -521,6 +538,7 @@ libxl_device_pci = Struct("device_pci", [
> >>      ("power_mgmt", bool),
> >>      ("permissive", bool),
> >>      ("seize", bool),
> >>+    ("rdm_reserve",   libxl_rdm_reserve_flag),
> >>      ])
> >>
> >>  libxl_device_vtpm = Struct("device_vtpm", [
> >>diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
> >>index 26fb143..45be0d9 100644
> >>--- a/tools/libxl/libxlu_pci.c
> >>+++ b/tools/libxl/libxlu_pci.c
> >>@@ -42,6 +42,8 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, 
> >>unsigned int domain,
> >>  #define STATE_OPTIONS_K 6
> >>  #define STATE_OPTIONS_V 7
> >>  #define STATE_TERMINAL  8
> >>+#define STATE_TYPE      9
> >>+#define STATE_CHECK_FLAG      10
> >>  int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const 
> >> char *str)
> >>  {
> >>      unsigned state = STATE_DOMAIN;
> >>@@ -143,6 +145,17 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, 
> >>libxl_device_pci *pcidev, const char *str
> >>                      pcidev->permissive = atoi(tok);
> >>                  }else if ( !strcmp(optkey, "seize") ) {
> >>                      pcidev->seize = atoi(tok);
> >>+                }else if ( !strcmp(optkey, "rdm_reserve") ) {
> >>+                    if ( !strcmp(tok, "force") ) {
> >>+                        pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_FORCE;
> >>+                    } else if ( !strcmp(tok, "try") ) {
> >>+                        pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_TRY;
> >>+                    } else {
> >>+                        pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_FORCE;
> >>+                        XLU__PCI_ERR(cfg, "Unknown PCI RDM property flag 
> >>value:"
> >>+                                          " %s, so goes 'force' by 
> >>default.",
> >
> >If this is not an error, you don't need XLU__PCI_ERR.
> >
> >But I would say we should  just treat this as an error and
> >abort/exit/report (whatever the parser should do in this case).
> 
> In our case we just want to post a message to set a appropriate flag to
> recover this behavior like we write here,
> 
>                         XLU__PCI_ERR(cfg, "Unknown PCI RDM property flag
> value:"
>                                           " %s, so goes 'strict' by
> default.",
>                                      tok);

I suggest we just abort in this case and not second guess what the admin
wants.

> 
> This may just be a warning? But I don't we have this sort of definition,
> XLU__PCI_WARN, ...
> 
> So what LOG format can be adopted here?

Feel free to introduce XLU__PCI_WARN if it turns out to be necessary.

> 
> >
> >>+                                     tok);
> >>+                    }
> >>                  }else{
> >>                      XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s", 
> >> optkey);
> >>                  }
> >>@@ -167,6 +180,71 @@ parse_error:
> >>      return ERROR_INVAL;
> >>  }
> >>
> >>+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
> >>+{
> >>+    unsigned state = STATE_TYPE;
> >>+    char *buf2, *tok, *ptr, *end;
> >>+
> >>+    if ( NULL == (buf2 = ptr = strdup(str)) )
> >>+        return ERROR_NOMEM;
> >>+
> >>+    for(tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
> >>+        switch(state) {

Coding style. I haven't checked what actual style this file uses, but
there is inconsistency in this function by itself.

> >>+        case STATE_TYPE:
> >>+            if ( *ptr == '\0' || *ptr == ',' ) {
> >>+                state = STATE_CHECK_FLAG;
> >>+                *ptr = '\0';
> >>+                if ( !strcmp(tok, "host") ) {
> >>+                    rdm->type = LIBXL_RDM_RESERVE_TYPE_HOST;
> >>+                } else {
> >>+                    XLU__PCI_ERR(cfg, "Unknown RDM state option: %s", tok);
> >>+                    goto parse_error;
> >>+                }
> >>+                tok = ptr + 1;
> >>+            }
> >>+            break;
> >>+        case STATE_CHECK_FLAG:
> >>+            if ( *ptr == '=' ) {
> >>+                state = STATE_OPTIONS_V;
> >>+                *ptr = '\0';
> >>+                if ( strcmp(tok, "reserve") ) {
> >>+                    XLU__PCI_ERR(cfg, "Unknown RDM property value: %s", 
> >>tok);
> >>+                    goto parse_error;
> >>+                }
> >>+                tok = ptr + 1;
> >>+            }
> >>+            break;
> >>+        case STATE_OPTIONS_V:
> >>+            if ( *ptr == ',' || *ptr == '\0' ) {
> >>+                state = STATE_TERMINAL;
> >>+                *ptr = '\0';
> >>+                if ( !strcmp(tok, "force") ) {
> >>+                    rdm->reserve = LIBXL_RDM_RESERVE_FLAG_FORCE;
> >>+                } else if ( !strcmp(tok, "try") ) {
> >>+                    rdm->reserve = LIBXL_RDM_RESERVE_FLAG_TRY;
> >>+                } else {
> >>+                    XLU__PCI_ERR(cfg, "Unknown RDM property flag value: 
> >>%s",
> >>+                                 tok);
> >>+                    goto parse_error;
> >>+                }
> >>+                tok = ptr + 1;
> >>+            }
> >>+        default:
> >>+            break;
> >>+        }
> >>+    }
> >>+
> >>+    free(buf2);
> >>+
> >>+    if ( tok != ptr || state != STATE_TERMINAL )
> >>+        goto parse_error;
> >>+
> >>+    return 0;
> >>+
> >>+parse_error:
> >>+    return ERROR_INVAL;
> >>+}
> >>+
> >>  /*
> >>   * Local variables:
> >>   * mode: C
> >>diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h
> >>index 0333e55..80497f8 100644
> >>--- a/tools/libxl/libxlutil.h
> >>+++ b/tools/libxl/libxlutil.h
> >>@@ -93,6 +93,10 @@ int xlu_disk_parse(XLU_Config *cfg, int nspecs, const 
> >>char *const *specs,
> >>   */
> >>  int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const 
> >> char *str);
> >>
> >>+/*
> >>+ * RDM parsing
> >>+ */
> >>+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char 
> >>*str);
> >>
> >>  /*
> >>   * Vif rate parsing.
> >>diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> >>index 04faf98..9a58464 100644
> >>--- a/tools/libxl/xl_cmdimpl.c
> >>+++ b/tools/libxl/xl_cmdimpl.c
> >>@@ -988,7 +988,7 @@ static void parse_config_data(const char *config_source,
> >>      const char *buf;
> >>      long l;
> >>      XLU_Config *config;
> >>-    XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms;
> >>+    XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms, 
> >>*rdms;
> >>      XLU_ConfigList *channels, *ioports, *irqs, *iomem, *viridian;
> >>      int num_ioports, num_irqs, num_iomem, num_cpus, num_viridian;
> >>      int pci_power_mgmt = 0;
> >>@@ -1727,6 +1727,23 @@ skip_vfb:
> >>          xlu_cfg_get_defbool(config, "e820_host", &b_info->u.pv.e820_host, 
> >> 0);
> >>      }
> >>
> >>+    /*
> >>+     * By default our global policy is to query all rdm entries, and
> >>+     * force reserve them.
> >>+     */
> >>+    b_info->rdm.type = LIBXL_RDM_RESERVE_TYPE_HOST;
> >>+    b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_TRY;
> >
> >This should probably to into the _setdefault function of
> >libxl_domain_build_info.
> 
> Sorry, I just see this
> 
> libxl_domain_build_info_init()
>     |
>     + libxl_rdm_reserve_init(&p->rdm);
>       |
>       + memset(p, '\0', sizeof(*p));
> 
> But this should be generated automatically, right? So how to implement your
> idea? Could you give me a show?
> 

Check libxl_domain_build_info_setdefault.

To use libxl types. You normally do:

  libxl_TYPE_init
  libxl_TYPE_setdefault

  DO STUFF

  libxl_TYPE_dispose

_init and _dispose are auto-generated. _setdefault is not.

> >
> >>+    if (!xlu_cfg_get_list (config, "rdm", &rdms, 0, 0)) {
> >>+        if ((buf = xlu_cfg_get_listitem (rdms, 0)) != NULL ) {
> >>+            libxl_rdm_reserve rdm;
> >>+            if (!xlu_rdm_parse(config, &rdm, buf))
> >>+            {
> >>+                b_info->rdm.type = rdm.type;
> >>+                b_info->rdm.reserve = rdm.reserve;
> >>+            }
> >
> >You only have one rdm in b_info, so there is no need to use a list for
> >it in config file.
> >
> 
> Is this fine?
> 
>     if (!xlu_cfg_get_string(config, "rdm", &buf, 0)) {
> 
>         libxl_rdm_reserve rdm;
> 
>         if (!xlu_rdm_parse(config, &rdm, buf)) {
>             b_info->rdm.type = rdm.type;
> 
>             b_info->rdm.reserve = rdm.reserve;
> 
>         }
>     }
> 

I think it is fine. But you'd better wait a little bit for other people
to voice their opinions.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.