[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC] [Draft Design] ACPI/IORT Support in Xen.





On 10/27/2017 7:35 PM, Andre Przywara wrote:
Hi,
Hey Andre,

On 25/10/17 09:22, Manish Jaggi wrote:

On 10/23/2017 7:27 PM, Andre Przywara wrote:
Hi Manish,

On 12/10/17 22:03, Manish Jaggi wrote:
ACPI/IORT Support in Xen.
--------------------------------------

I had sent out patch series [0] to hide smmu from Dom0 IORT. Extending
the scope
and including all that is required to support ACPI/IORT in Xen.
Presenting for review
first _draft_ of design of ACPI/IORT support in Xen. Not complete
though.

Discussed is the parsing and generation of IORT table for Dom0 and
DomUs.
It is proposed that IORT be parsed and the information in saved into xen
data-structure
say host_iort_struct and is reused by all xen subsystems like ITS / SMMU
etc.

Since this is first draft is open to technical comments, modifications
and suggestions. Please be open and feel free to add any missing points
/ additions.

1. What is IORT. What are its components ?
2. Current Support in Xen
3. IORT for Dom0
4. IORT for DomU
5. Parsing of IORT in Xen
6. Generation of IORT
7. Future Work and TODOs

1. What is IORT. What are its components ?
--------------------------------------------
IORT refers to Input Output remapping table. It is essentially used
to find
information about the IO topology (PCIRC-SMMU-ITS) and relationships
between
devices.

A general structure of IORT is has nodes which have information about
PCI RC,
SMMU, ITS and Platform devices. Using an IORT table relationship between
RID -> StreamID -> DeviceId can be obtained. More specifically which
device is
behind which SMMU and which interrupt controller, this topology is
described in
IORT Table.

RID is a requester ID in PCI context,
StreamID is the ID of the device in SMMU context,
DeviceID is the ID programmed in ITS.

For a non-pci device RID could be simply an ID.

Each iort_node contains an ID map array to translate from one ID into
another.
IDmap Entry {input_range, output_range, output_node_ref, id_count}
This array is present in PCI RC node,SMMU node, Named component node etc
and can reference to a SMMU or ITS node.

2. Current Support of IORT
---------------------------
Currently Xen passes host IORT table to dom0 without any modifications.
For DomU no IORT table is passed.

3. IORT for Dom0
-----------------
IORT for Dom0 is prepared by xen and it is fairly similar to the host
iort.
However few nodes could be removed removed or modified. For instance
- host SMMU nodes should not be present
- ITS group nodes are same as host iort but, no stage2 mapping is done
for them.
What do you mean with stage2 mapping?
Please ignore this line. Copy paste error. Read it as follows

- ITS group nodes are same as host iort.
(though I would modify the same as in next draft)

- platform nodes (named components) may be selectively present depending
on the case where xen is using some. This could be controlled by  xen
command
line.
Mmh, I am not so sure platform devices described in the IORT (those
which use MSIs!) are so much different from PCI devices here. My
understanding is those platform devices are network adapters, for
instance, for which Xen has no use.
ok.
So I would translate "Named Components" or "platform devices" as devices
just not using the PCIe bus (so no config space and no (S)BDF), but
being otherwise the same from an ITS or SMMU point of view.
Correct.
- More items : TODO
I think we agreed upon rewriting the IORT table instead of patching it?
yes. In fact if you look at my patch v2 on IORT SMMU hiding, it was
_rewriting_ most of Dom0 IORT and not patching it.
I was just after the wording above:
"IORT for Dom0 is prepared by xen and it is fairly similar to the host
iort. However few nodes could be removed removed or modified."
... which sounds a bit like you alter the h/w IORT.
It would be good to clarify this by explicitly mentioning the
parsing/generation cycle, as this is a fundamental design decision.
Sure will do that. Thanks for pointing that.
We can have a IRC discussion on this.

I think apart from rewriting, the other tasks which were required that
are handled in this epic task
- parse IORT and save in xen internal data structures
- common code to generate IORT for dom0/domU
- All xen code that parses IORT multiple times use now the xen internal
data structures.
Yes, that sounds about right.
:)

(I have explained this in this mail below)
So to some degree your statements are true, but when we rewrite the IORT
table without SMMUs (and possibly without other components like the
PMUs), it would be kind of a stretch to call it "fairly similar to the
host IORT". I think "based on the host IORT" would be more precise.
Yes. Based on host IORT is better,thanks.
4. IORT for DomU
-----------------
IORT for DomU is generated by the toolstack. IORT topology is different
when DomU supports device passthrough.
Can you elaborate on that? Different compared to what? My understanding
is that without device passthrough there would be no IORT in the first
place?
I was exploring the possibility of having virtual devices for DomU.
So if a virtual is assigned to guest there needs to be some mapping in
IORT as well.
This virtual device can be on a PCI bus / or as a platform device.

Device Pass-through can be split into two parts
a. platform device passthrough (not on PCI bus)
b. PCI device PT
I understand that, but am still wondering how it would be "different".
We just start with creating our mapping data structure *from scratch*,
the same one we generate by *parsing* the host IORT.
Whether this points to a purely virtual device, a PCI PT or a platform
PT, should not matter for this purpose.
I rest my case till I can cite a valid example :)
=> If we discount the possibility of a virtual device for domU and
platform device passthrough
  then you are correct no IORT is required.
I believe we need an IORT once we have devices which use MSIs.
yes.
When PCI device passthrough is supported, the PCIRC is itself virtual
(emulated by Xen).
One can have any number of virtual PCIRC  and may be virtual SMMUs.
Hence the topology can vary.
I think I don't disagree, my initial comment was just about the
confusion that this "IORT topology is *different* from" term created.
Ok, I will move it in a different section and remove the term "different".

Now read the below lines.
At a minimum domU IORT should include a single PCIRC and ITS Group.
Similar PCIRC can be added in DSDT.
Additional node can be added if platform device is assigned to domU.
No extra node should be required for PCI device pass-through.
Again I don't fully understand this last sentence.
The last line is continuation of the first line "At a minimum..."
OK, but still I don't get how we would end up with an IORT without
(pass-throughed) PCI devices in the first place?
If hypothetically a platform device uses MSI.
I will let Sameer comment on it.
Our platform does not have a Named Component node in IORT.
It is proposed that the idrange of PCIRC and ITS group be constant for
domUs.
"constant" is a bit confusing here. Maybe "arbitrary", "from scratch" or
"independent from the actual h/w"?
ok. that is implementation defined.
In case if PCI PT,using a domctl toolstack can communicate
physical RID: virtual RID, deviceID: virtual deviceID to xen.

It is assumed that domU PCI Config access would be trapped in Xen. The
RID at which assigned device is enumerated would be the one provided
by the
domctl, domctl_set_deviceid_mapping

TODO: device assign domctl i/f.
Note: This should suffice the virtual deviceID support pointed by Andre.
[4]
Well, there's more to it. First thing: while I tried to include virtual
ITS deviceIDs to be different from physical ones, in the moment there
are fixed to being mapped 1:1 in the code.
oh
So the first step would be to go over the ITS code and identify where
"devid" refers to a virtual deviceID and where to a physical one
(probably renaming them accordingly). Then we would need a function to
translate between the two. At the moment this would be a dummy function
(just return the input value). Later we would loop in the actual table.
Some thought here..
Wouldn't it be better to call a helper function to translate the devid
coming from guest. The helper function
would look at the table created by handling successive domctls (the one
mentioned here)
Exactly.
Thanks
We might not need this domctl if assign_device hypercall is extended to
provide this information.
Do we actually need a new interface or even extend the existing one?
If I got Julien correctly, the existing interface is just fine?
Could you explain which existing interface  can be used to translate
guest device ID to host device ID when an ITS command gets trapped in Xen.
may be I am missing something here.
I haven't looked in detail, but will do.

5. Parsing of IORT in Xen
--------------------------
IORT nodes can be saved in structures so that IORT table parsing can be
done once and is reused by all xen subsystems like ITS / SMMU etc,
domain
creation.
Proposed are the structures to hold IORT information, very similar to
ACPI
structures.

iort_id_map {
      range_t input_range;
      range_t output_range;
      void *output_reference;
...
}
I guess you would need a "struct list_head list" here to chain the
ranges?
yes :). That was in ...
=>output_reference points to object of iort_node.

struct iort_node {
      struct list_head id_map;
      void *context;
      struct list_head list;
}
=> context could be a reference to acpi_iort_node.

struct iort_table_struct {
      struct list_head pci_rc_nodes;
      struct list_head smmu_nodes;
      struct list_head plat_devices;
      struct list_head its_group;
}
So quickly brainstorming with Julien I was wondering if we could
actually simplify this significantly:
  From Xen's point of view all we need to know is the mapping between PCI
requestor IDs (or some platform device IDs) to the physical ITS device
ID, and from requestor IDs to the SMMU stream ID.
That would be just *two* lookup tables, not connected to each other
aside from possibly having the same input ranges. At this point we could
also have *one* table, containing both the ITS deviceID and the SMMU
stream ID:

struct iort_id_map {
     range_t input_range;
     uint32_t its_devid_base;
     uint32_t smmu_streamid_base;
     struct list_head list;
};
This is just a simpler case of what the spec supports.
Sure, and without the need for a *chained* mapping (RC->SMMU->ITS)
this is actually all we need (RC->ITS; RC->SMMU). I have the feeling
that this simplifies the code a lot.
There are few cases which spec supports but are not found in present day hardware.
for instance
a. Spec allows having 2 PCI_RC behind same SMMU and the SMMU behind single ITS.
How would you map back ITS deviceID to PCI_RC.

b. Similarly if simplified list RC->ITS is created (which is infact similar to IORT for Dom0, hiding smmu)
how would you know which SMMU the device is on ?
So you have to lookup another list which is RC->SMMU

Each PCIRC node can have an array of idmaps, each id map entry can have
a multiple of idmap entries in
the output reference smmu idmap.

I had a similar discussion with the v1 version of my IORT SMMU hidepatch
with julien.

Moreover I dont quite understand the where iort_id_map would fit.
As if you see below reply, we have a other things to take care as well
So parsing the IORT would create and fill a list of those structures.
For a lookup we would just iterate over that list, find a matching entry
and:
return (input_id - match->input_range.base) + match->its_devid_base;

Ideally we abstract this via some functions, so that we can later swap
this for more efficient data structures should the need arise.

This structure is created at the point IORT table is parsed say from
acpi_iort_init.
It is proposed to use this structure information in
iort_init_platform_devices.
[2] [RFC v2 4/7] ACPI: arm: Support for IORT
I guess you missed this part. Without this the  context is missed.

The main purpose of this whole task is split into two parts
a. IORT parsing should be done once in Xen and later whenever IORT
parsing is required use xen internal data structures
       (iort_table_struct)

     - this would be helpful for (b) below
     - and for SMMU / platform devices initialization.
       If you see [2], it again parses the IORT.
       So the approach here is  [2] and (b) should use same Xen internal
IORT data structures.
I think I get this, I was just wondering why we would need a more a
less exact replication of the IORT,
Yes, but I have a context which points to IORT node so not everything is replicated.
with the pointering complicating
things.
How are pointers complicating things ? In which case do you think there would be extra pointer handling. I was infact planning to add a back pointer so that given an ITS id we can trace back to SMMU and PCI_RC.
I believe what we need is:
1) a mapping from a PCI-RC or PT-NC to stream IDs, for programming the
SMMU in Xen
2) a mapping from a PCI-RC or PT-NC to ITS devIDs, for programming the
ITS when being asked for by a guest (incl. Dom0)
I think that the IORT is a streamlined and optimized representation of
those mappings, which we don't necessarily need to replicate 1:1 in an
in-memory data structure.
We might replicate the skeleton as 1:1.
As I explained above that we inadvertently would be parsing multiple lists.
But admittedly I haven't looked with too much details into this, so if
you convince me that we need this graph structure, then so be it.
Keeping PCI_RC->ITS mapping and PCI_RC -> SMMU mapping might not be sufficient for all the cases
spec supports. Keeping a graph structure all cases can be handled.
=>      For that reason [2]/[5]  might need to be rebased on this task's
patch. <=
[5] https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxx/msg123080.html

b. Generate IORT for Doms without patching Host IORT, rather regenerate
from xen internal data structures.

based on this rationale, I think the data structures mentioned would be
required.

6. IORT Generation
-------------------
There would be a common code to generate IORT table from
iort_table_struct.
That sounds useful, but we would need to be careful with sharing code
between Xen and the tool stack. Has this actually been done before?
I added the code sharing part here, but I am not hopeful that this would
work as it would require lot of code change on toolstack.
A simple difference is that the acpi header structures have different
member variables. This is same for other structures.
So we might have to create a lot of defines in common code for sharing
and possibility of errors.

See: struct acpi_header in acpi2_0.h (tools/libacpi)
and struct acpi_table_header in actbl.h (xen/include/acpi)
What do you think about this difference in basic structures in toolstack and xen code. When we write a common library should I include a #define for mapping xen structure to toolstack. Would it have more overhead than duplication, that is an implementation issue.
That is why I preferred a domctl, so xen coud prepare IORT for DomU.
I don't this it's justified to move a simple table generation task
into Xen, just to allow code sharing. After all this does not require
any Xen internal knowledge. So it should be done definitely in the
toolstack.
Yes. Fully agree.
The point here is duplication or code reuse.
See above.

I think we should follow Julien's suggestion of looking at xen/common/libelf.
ok
Cheers,
Andre.

If not code sharing then code duplication might also work (In that case
no domctl required)
We can discuss on this more...
a. For Dom0
      the structure (iort_table_struct) be modified to remove smmu nodes
      and update id_mappings.
      PCIRC idmap -> output refrence to ITS group.
      (RID -> DeviceID).

      TODO: Describe algo in update_id_mapping function to map RID ->
DeviceID used
      in my earlier patch [3]
If the above approach works, this would become a simple list iteration,
creating PCI rc nodes with the appropriate pointer to the ITS nodes.
Yes, it works. see [3]
b. For DomU
      - iort_table_struct would have minimal 2 nodes (1 PCIRC and 1 ITS
group)
      - populate a basic IORT in a buffer passed by toolstack( using a
domctl : domctl_prepare_dom_iort)
I think we should reduce this to iterating the same data structure as
for Dom0. Each pass-through-ed PCI device would possibly create one
struct instance, and later on we do the same iteration as we do for
Dom0. If that proves to be simple enough, we might even live with the
code duplication between Xen and the toolstack.
Yes, thats the Idea. for domu and dom0 the IORT generation code would take

iort_table_struct
as input.

Cheers,
Andre.

      - DSDT for the DomU is updated by toolstack to include a PCIRC.
      - If a named component is added to domU that information is passed
in the
      same/additional domctl.
          - <TODO: domctl_prepare_dom_iort i/f >
      Note: Julien I have tried to incorporate your suggestion for code
reuse.

7. References:
-------------
[0] https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxx/msg121667.html
[1] ARM DEN0049C:
http://infocenter.arm.com/help/topic/com.arm.doc.den0049c/DEN0049C_IO_Remapping_Table.pdf


[2] https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxx/msg123082.html
[3] https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxx/msg121669.html:
update_id_mapping function.
[4] https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxx/msg123434.html




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.