[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Hypercall fault injection (Was [PATCH 0/3] xen/domain: More structured teardown)
 
 
 
 On 22/12/2020 10:00, Jan Beulich wrote: 
> On 21.12.2020 20:36, Andrew Cooper wrote: 
>> Hello, 
>> 
>> We have some very complicated hypercalls, createdomain, and max_vcpus a 
>> close second, with immense complexity, and very hard-to-test error handling. 
>> 
>> It is no surprise that the error handling is riddled with bugs. 
>> 
>> Random failures from core functions is one way, but I'm not sure that 
>> will be especially helpful.  In particular, we'd need a way to exclude 
>> "dom0 critical" operations so we've got a usable system to run testing on. 
>> 
>> As an alternative, how about adding a fault_ttl field into the hypercall? 
>> 
>> The exact paths taken in {domain,vcpu}_create() are sensitive to the 
>> hardware, Xen Kconfig, and other parameters passed into the 
>> hypercall(s).  The testing logic doesn't really want to care about what 
>> failed; simply that the error was handled correctly. 
>> 
>> So a test for this might look like: 
>> 
>> cfg = { ... }; 
>> while ( xc_create_domain(xch, cfg) < 0 ) 
>>     cfg.fault_ttl++; 
>> 
>> 
>> The pro's of this approach is that for a specific build of Xen on a 
>> piece of hardware, it ought to check every failure path in 
>> domain_create(), until the ttl finally gets higher than the number of 
>> fail-able actions required to construct a domain.  Also, the test 
>> doesn't need changing as the complexity of domain_create() changes. 
>> 
>> The main con will mostly likely be the invasiveness of code in Xen, but 
>> I suppose any fault injection is going to be invasive to a certain extent. 
> While I like the idea in principle, the innocent looking 
> 
> cfg = { ... }; 
> 
> is quite a bit of a concern here as well: Depending on the precise 
> settings, paths taken in the hypervisor may heavily vary, and hence 
> such a test will only end up being useful if it covers a wide 
> variety of settings. Even if the number of tests to execute turned 
> out to still be manageable today, it may quickly turn out not 
> sufficiently scalable as we add new settings controllable right at 
> domain creation (which I understand is the plan). 
 
Well - there are two aspects here. 
 
First, 99% of all VMs in practice are one of 3 or 4 configurations.  An 
individual configuration is O(n) time complexity to test with fault_ttl, 
depending on the size of Xen's logic, and we absolutely want to be able 
to test these deterministically and to completion. 
 
For the plethora of other configurations, I agree that it is infeasible 
to test them all.  However, a hypercall like this is easy to wire up 
into a fuzzing harness. 
 
TBH, I was thinking of something like 
https://github.com/intel/kernel-fuzzer-for-xen-project with a PVH Xen 
and XTF "dom0" poking just this hypercall.  All the other complicated 
bits of wiring AFL up appear to have been done. 
 
Perhaps when we exhaust that as a source of bugs, we move onto fuzzing 
the L0 Xen, because running on native will give it more paths to 
explore.  We'd need some way of reporting path/trace data back to AFL in 
dom0 which might require a bit plumbing. 
 
 This is a pretty cool idea, I would be very interested in trying this out. If running Xen nested in a HVM domain is possible (my experiments with nested setups using Xen have only worked on ancient hw last time I tried) then running the fuzzer would be entirely possible using VM forks. You don't even need a special "dom0", you could just add the fuzzer's CPUID harness to Xen's hypercall handler and the only thing needed from the nested dom0 would be to trigger the hypercall with a normal config. The fuzzer would take it from there and replace the config with the fuzzed version directly in VM forks. Defining what to report as a "crash" to AFL would still need to be defined manually for Xen as the current sink points are Linux specific ( https://github.com/intel/kernel-fuzzer-for-xen-project/blob/master/src/sink.h), but that should be straight forward. 
 
 Also, running the fuzzer with PVH guests hasn't been tested but since all VM forking needs is EPT it should work. 
 
 Tamas 
 
    
 
    
     |