[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen and safety certification, Minutes of the meeting on Apr 4th



Hi Jarvis

On 06.04.18 20:01, Jarvis Roach wrote:
Hi all,

adding a few more people who are/may be interested in safety certification,
including committers (because item 1 would have an impact). Specifically:
Rich Persaud, Paul Luperto, Jonathan Daugherty and Denys Balatsko.

There are a few loose ends and updates from other/similar related threads
that we should pull into this thread:

a) AGL Whitepaper
This is out as far as I can tell
See
https://docs.google.com/document/d/1HpYzClh0nDEocsUHb17X0DxiehsAb
CgyWE-P2Wk_RNU/edit#
Thank you to Rich for driving this and to all the contributors from the Xen
Community

Related to this is the following item from the original minutes
AGL will select 2 hypervisors out of the list. Artem has already an
out-of-the-box solution for AGL. Artem will chase up and make sure
that Xen will be one of the two.

b) Genivi AMM Hypervisor Workshop, Apr 19 Artem and me will be
speaking on various Xen related projects. I will send a draft PDF to this list
later this week.
Slots are short: 10 minutes + questions each slot See
https://at.projects.genivi.org/wiki/display/DIRO/Hypervisor+Workshop+Te
am

c) Xen Specific Automotive Whitepaper
This was discussed during a) and I think it would be relatively easy to pull
something together. It would be good if someone else, but me could lead
this. We have a lot of information already, but more ground-work on safety
certification may help. Would there be a volunteer driving this? I could be
used as a vehicle to move some of the items discussed in the minutes
along.

d) I also created
https://wiki.xenproject.org/wiki/Category:Safety_Certification to start
pulling material relevant to safety and context for it into one place.
It's a little crude at this point in time and I expect this document to evolve
and split into smaller parts.
It would be good, if someone on this list could go over
https://wiki.xenproject.org/wiki/Category:Safety_Certification#Automotive
_Requirements and map the requirements to functionality we already have.
This could then feed into c.

Any takers?

Artem suggested to write a whitepaper about Xen real-time capabilities.
Stefano volunteered to help.
I believe we have some gaps with regards to real-time requirements and
that paper is aiming to highlight these.
@Artem: maybe this would be a suitable topic for the developer summit
(amongst others) As a reminder: the CfP for the summit closes next Friday


One of my engineers has highlighted the need to move Xen to use preemptive 
locks (similar to what was done with the Linux RT patch updates) before it can 
be considered hard real-time. Right now we've been pitching it as soft 
real-time.


I contacted Lars (CC'ed) who volunteered to help.
I am volunteering to act as a program/project manager for this activity. In
particular to bootstrap.

I think the only practicable way to make progress in this area, is to set up
some mechanism which allow us to make progress towards the goal of
making it easier and cheaper to build safety certified variants of Xen. As a
side-effect of this process we should get data, to scope out the scale of the
problem further, that should enable getting more vendors interested.

The main topic of the meeting was certifications for Xen on ARM. The
gap analysis document, mentioned in the previous call, is copyrighted.
It might not be possible to relicense it. Regardless of the document,
we started discussing the major work items and next steps.

@Stefano: Thanks for driving this discussion I re-ordered some of the items,
to make it more palatable

2) Create a subset of functions that need to go through certifications
Next step: create a small Kconfig. We could use the Renesas Rcar as
reference. We need a discussion about the features we need, for
example real-time schedulers, do we need them or not?


Identifying this subset is very important. My recommendation would be to 
identify the very smallest subset to start with that supports a single, high 
value use case, which I would suggest is consolidation of Linux and real-time 
applications with mixed criticality, but not necessarily shared/PV I/O, onto a 
single processing cluster. Identifying the highest reasonable safety 
criticality to support would also be very helpful.


Unfortunately in mixed criticality systems (at least in automotive) we see a lot of attention to performance and , so processing cluster partitioning may not be well accepted in the industry

At the Xen level, you might get away with just the null scheduler if VMs are 
pinned to their own cores (and jitter caused by contention on the bus and in 
the cache is acceptable). However, to do CAST-32a type scheduling (effectively 
time slicing the SoC between your VMs), an updated ARINC-653 scheduler would be 
needed.


We are now looking into RTDS as a possible solution for industrial or automotive domains. Also , from our experience bus/cache contention in systems with high load is actually an issue... Looking into that, too


@Stefano agreed to drive this.
The minimal configuration does impact 1 and 2, which is why I moved this
first.

We should probably agree a basic process: aka
* Measure baseline size in KSLOC
* Remove some feature
* Measure reduction in KSLOC
And record the data somewhere

1) Requirements to the code, a subset of MISRA for ASIL B Next step:
get more information about requirements and publish it to xen-devel.

I see a few problems here:

* The MISCRA 2012 spec has to be bought and it is rather big (100's of
pages):
so, I don't think it is practical to work from the spec

* Some coding style patterns will likely be perceived as odd and
unreasonable by community members: as some common code would be
affected we cannot treat this in isolation say on ARM only. Although it is
recognized that some of the coding style patterns may not make sense,
compliance to MISRA is necessary and cannot normally be discussed away.

* PRQA has set up an environment and initial MISRA compliance report for
a Xen on ARM build
** The question is what (if anything) can be shared publicly
** The other open question is whether we can come to some sort of longer
term agreement between the Xen Project and PRQA to use their tools
** As an aside, what PRQA have done would need to reflect what we do in
step 2 is. We also want to minimize the work for PRQA: in other words, it
has to be very simple to enable the minimal config coming out of task 2
such that PRQA can
** As far as I recall 90% of all MISRA violations come down to around 70
issues. A large number are in tools
** Also, I believe that MISRA compliance tools will likely lead to a large
amount of false positives, due to the distributed nature of Xen: process
boundaries, kernel/user space boundaries, etc. would all lead to false
positives, which somehow have to be managed.

ACTION => Lars to follow up with Paul Luperto from PRQA

* An approach that may be manageable would be to look at the most
common MISRA violations and work backwards from there.
** This would make the problem more manageable and mean people
wouldn't have to read a long spec
** Discussing a small set of issues, would give us a sense of whether/what
type of disagreements there are and how we resolve them.
** We should focus prioritize based on:
a) Address/discuss the most frequently occurring issues first
b) Address/discuss issues in common code first

At the very least (and for now in absence of the capability to check
compliance), I would need someone who has access to MISRA compliance
tools, to drive such an effort.

3) Understand how to address dom0. FreeRTOS Dom0 sounds like a good
solution.
Next step: reach out to Dornerworks and/or others that worked with
FreeRTOS on Xen before. Figure out whether FreeRTOS is actually a
suitable solution and what needs to be done to run FreeRTOS as Dom0.

Some things to check at this stage:
a) I believe there is a safety certified version of FreeRTOS - I could not find
much, except for https://www.freertos.org/FreeRTOS-
Plus/Safety_Critical_Certified/SafeRTOS-Safety-Critical-Certification.shtml -
which describes SafeRTOS a commercial safety certified FreeRTOS and
(mostly) API compliant version of FreeRTOS. Or am I missing something
here?
b) There is a DomU capable version from Galois (Jonathan Docherty CC'ed) -
I don't know whether others also have such versions

I ported the version of FreeRTOS that Xilinx distributes with their SDK to run 
as a domU on the ZUS+ in 2016 and round tripped the change set back to Richard 
Barry.
I've also heard interest in running RTEMS as a guest OS.


We've had experience in running QNX in domu, but that was not very welcomed by BB QSSL folks back then :) They dont really like OSS

Since I do not think that a previously certified OS will be available for free, 
I see 3 general approaches wrt dom0:
1) Find and certify an open source OS. My guess is this will not be Linux due 
to code base size. POSIX support a plus.
2) Use a commercially available, previously certified OS for dom0. DW ported 
VxWorks to run on Xen in 2017 and uc/OS-III in 2016.
3) Go with a dom0-less solution; bootloader starts up the necessary VMs based 
on a static configuration.

The XL toolstack in its current form will likely cause cert issues and will 
probably need to be stripped down and/or rewritten.
Bootloader (U-Boot, GRUB, or whatever) will also need to be certified.


We'd like to explore both FreeRTOS in dom0 and dom0-less options. I think there were some patches while ago for dom0-less xen.

c) There is a POXIX wrapper, which may be needed, but it is unclear what
this would do to the FreeRTOS footprint
d) In other words, what we would have to do is to investigate whether it is
possible to build to a Dom0 capable FreeRTOS

I see several ways of approaching this:
a) A vendor (or groups of vendors) on this list steps up
b) We go initially for a lower bar: aka we try and scope out and cost the
creation of a Dom0 capable FreeRTOS and then look at how the work can
get funded

A very good starting point would be to get a list of parties that are
interested in having and using a FreeRTOS based Dom0 (regardless of how
we get there). A show of hands would be good.
> DW is interested in participating with exploring ways to solve the
dom0 problem (be it FreeRTOS or other approaches).

Some insights from anyone on the FreeRTOS/SafeRTOS relationship and
politics would be good also. Unless there is a route from FreeRTOS
upstream to the certified version, someone in our eco-system would have
to safety certify FreeRTOS (which may not be such a big deal given the fairly
small size of FreeRTOS).


This link helps explains the relationship between FreeRTOS and SafeRTOS:
        
https://www.highintegritysystems.com/safertos/upgrade-from-freertos-to-safertos/


4) Create artifacts, such as docs, fault analysis, prove fault
tolerance, safety management docs, development processes.
Next step: we need to bring in a company, a certification body, to
guide us through the process.

We have companies such as Dornerworks on this list which are experienced
with safety certification on Xen for some safety standards: it is not clear to
me how much of this is transferable to automotive.


Papers have shown that there is a lot of overlap between the artifacts and 
processes defined in different safety standards.

However, someone with more experience in automotive safety should speak to the 
concern of non-determinism/jitter in that market. Aviation certification 
authorities are practically rabid about it, and you have to go to great lengths 
to satisfy them (disable interrupts, flush cache between partitions, prove that 
silicon vendor's secret features are all disabled, etc) which might be overkill 
for automotive.


Indeed, we need to analyze safety in different domains, but at least all derivatives from IEC 61508 (ISO 26262, etc.) have common baseline. I am not sure about medical/aerospace/military though - we do not have expertise there.


Here my understanding is that we need a certification partner like TÜV,
MIRA or a company like Dornerworks who already have experience with
Xen. By working with a partner experienced in certification, the overall cost
of certification would be significantly reduced. The elephant in the room is
funding and a business model (aka all the items listed in
https://docs.google.com/document/d/1HpYzClh0nDEocsUHb17X0DxiehsAb
CgyWE-P2Wk_RNU/edit section 4.1). The reality is that organisations such
as TÜV, MIRA, Dornerworks, ... will need to be paid by someone. Which, I
think we need to park for now.


I wouldn't leave it parked too long. The issues of funding and remuneration 
will delay/derail progress more than all of the technical challenges combined.


What I think are sensible goals for now are
a) Establish a list of potential partners and start establishing contacts - such
conversations would need to be led by a vendor, otherwise it will go
nowhere. What would be good though is to have a shared (but possibly
private) repository of how these conversations have gone.
b) Otherwise focus on tasks 1-3 which deal with some issues listed in
https://www.slideshare.net/xen_com_mgr/art-certification, which is still
very valid
c) Engage/work with with other groups (AGL, Genivi, Linaro) who are also
looking at this problem

It may be worth in the mid-term to consider some sort of pilot around a
small portion of the Xen codebase: the aim would be to gather data that
helps establish what can be done in a collaborative FOSS environment.

Feedback/views are very welcome

Regards
Lars



Cheers!
-Jarvis


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.