[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 03/20] xen/riscv: introduce VMID allocation and manegement
On 06.08.2025 13:33, Oleksii Kurochko wrote: > On 8/4/25 5:19 PM, Jan Beulich wrote: >> On 31.07.2025 17:58, Oleksii Kurochko wrote: >>> @@ -148,6 +149,8 @@ void __init noreturn start_xen(unsigned long bootcpu_id, >>> >>> console_init_postirq(); >>> >>> + vmid_init(); >> This lives here only temporarily, I assume? Every hart will need to execute >> it, and hence (like we have it on x86) this may want to be a central place >> elsewhere. > > I haven’t checked how it is done on x86; I probably should. > > I planned to call it for each hart separately during secondary hart bring-up, > since accessing the|hgatp| register of a hart is required to detect|VMIDLEN|. > Therefore,|vmid_init()| should be called for secondary harts when their > initialization code starts executing. But is this going to be the only per-hart thing that will need doing? Otherwise the same larger "container" function may want calling instead. >>> +static unsigned long vmidlen_detect(void) >> __init ? Or wait, are you (deliberately) permitting different VMIDLEN >> across harts? > > All what I was able in RISC-V spec is that: > The number of VMID bits is UNSPECIFIED and may be zero. The number of > implemented VMID bits, termed VMIDLEN, may be determined by writing one > to every bit position in the VMID field, then reading back the value in > hgatp to see which bit positions in the VMID field hold a one. The least- > significant bits of VMID are implemented first: that is, if VMIDLEN > 0, > VMID[VMIDLEN-1:0] is writable. The maximal value of VMIDLEN, termed > VMIDMAX, is 7 for Sv32x4 or 14 for Sv39x4, Sv48x4, and Sv57x4.. > And I couldn't find explicitly that VMIDLEN will be the same across harts. > > Therefore, IMO, while the specification doesn't guarantee VMID will be > different, the "unspecified" nature and the per-hart discovery mechanism > of VMIDLEN in the hgatp CSR allows for VMIDLEN to be different on > different harts in an implementation without violating the > RISC-V privileged specification. Okay, since that's easily feasible with the present implementation, why not keep it like that then. >>> +{ >>> + unsigned long vmid_bits; >> Why "long" (also for the function return type)? > > Because csr_read() returns unsigned long as HGATP register has > 'unsigned long' length. Oh, right, I should have commented on the function return type only. Yet then I also can't resist stating that this kind of use of a variable, which initially is assigned a value that doesn't really fit its name, is easily misleading towards giving such comments. > But it could be done in this way: > csr_write(CSR_HGATP, old | HGATP_VMID_MASK); > vmid_bits = MASK_EXTR(csr_read(CSR_HGATP), HGATP_VMID_MASK); > vmid_bits = ffs_g(vmid_bits); > csr_write(CSR_HGATP, old); > And then use uint16_t for vmid_bits and use uin16_t as a return type. Please check ./CODING_STYLE again as to the use of fixed-width types. >>> + unsigned long old; >>> + >>> + /* Figure-out number of VMID bits in HW */ >>> + old = csr_read(CSR_HGATP); >>> + >>> + csr_write(CSR_HGATP, old | HGATP_VMID_MASK); >>> + vmid_bits = csr_read(CSR_HGATP); >>> + vmid_bits = MASK_EXTR(vmid_bits, HGATP_VMID_MASK); >> Nit: Stray blank. >> >>> + vmid_bits = flsl(vmid_bits); >>> + csr_write(CSR_HGATP, old); >>> + >>> + /* >>> + * We polluted local TLB so flush all guest TLB as >>> + * a speculative access can happen at any time. >>> + */ >>> + local_hfence_gvma_all(); >> There's no guest running. If you wrote hgat.MODE as zero, as per my >> understanding now new TLB entries could even purely theoretically appear. > > It could be an issue (or, at least, it is recommended) when hgatp.MODE is > changed: > If hgatp.MODE is changed for a given VMID, an HFENCE.GVMA with rs1=x0 > (and rs2 set to either x0 or the VMID) must be executed to order subsequent > guest translations with the MODE change—even if the old MODE or new MODE > is Bare. > On other hand it is guaranteed that, at least, on Reset (and so I assume > for power on) that: > If the hypervisor extension is implemented, the hgatp.MODE and vsatp.MODE > fields are reset to 0. > > So it seems like if no guest is ran then there is no need even to write > hgatp.MODE as zero, but it might be sense to do that explicitly just to > be sure. > > I thought it was possible to have a running guest and perform a CPU hotplug. But that guest will run on another hart. > In that case, I expect that during the hotplug,|vmidlen_detect()| will be > called and return the|vmid_bits| value, which is used as the active VMID. > At that moment, the local TLB could be speculatively polluted, I think. > Likely, it makes sense to call vmidlen_detect() only once for each hart > during initial bringup. That may bring you more problems than it solves. You'd need to stash away the value originally read somewhere. And that somewhere isn't per-CPU data. >> In fact, with no guest running (yet) I'm having a hard time seeing why >> you shouldn't be able to simply write the register with just >> HGATP_VMID_MASK, i.e. without OR-ing in "old". It's even questionable >> whether "old" needs restoring; writing plain zero afterwards ought to >> suffice. You're in charcge of the register, after all. > > It make sense (but I don't know if it is a possible case) to be sure that > HGATP.MODE remains the same, so there is no need to have TLB flush. If > HGATP.MODE is changed then it will be needed to do TLB flush as I mentioned > above. > > If we agreed to keep local_hfence_gvma_all() then I think it isn't really > any sense to restore 'old' or OR-ing it with HGATP_VMID_MASK. > > Generally if 'old' is guaranteed to be zero (and, probably, it makes sense > to check that in vmidlen_detect() and panic if it isn't zero) and if > vmidlen_detect() function will be called before any guest domain(s) will > be ran then I could agree that we don't need local_hfence_gvma_all() here. > > As an option we can do local_hfence_gvma_all() only if 'old' value wasn't > set to zero. > > Does it make sense? Well - I'd like the pre-conditions to be understood better. For example, can a hart really speculate into guest mode, when the hart is only in the process of being brought up? >>> + data->max_vmid = BIT(vmid_len, U) - 1; >>> + data->disabled = !opt_vmid_enabled || (vmid_len <= 1); >> Actually, what exactly does it mean that "VMIDs are disabled"? There's >> no enable bit that I could find anywhere. Isn't it rather that in this >> case you need to arrange to flush always on VM entry (or always after a >> P2M change, depending how the TLB is split between guest and host use)? > > "VMIDs are disabled" here means that TLB flush will happen each time p2m > is changed. That's better described as "VMIDs aren't used" then? >>> + if ( g_disabled != data->disabled ) >>> + { >>> + printk("%s: VMIDs %sabled.\n", __func__, >>> + data->disabled ? "dis" : "en"); >>> + if ( !g_disabled ) >>> + g_disabled = data->disabled; >> This doesn't match x86 code. g_disabled is a tristate there, which only >> the boot CPU would ever write to. > > Why g_disabled is written only by boot CPU? Does x86 have only two options > or VMIDs are enabled for all CPUs or it is disabled for all of them? Did you look at the x86 code again, or the patch that I sent for it? > For RISC-V as I mentioned above it is needed to check all harts as the spec. > doesn't explicitly mention that VMIDLEN is equal for all harts... Even if in practice x86 systems are symmetric in this regard, you will have seen that we support varying values there as well. Up to and including ASIDs being in use on some CPUs, but not on others. So that code can serve as a reference for you, I think. >> A clear shortcoming of the x86 code (that you copied) is that the log >> message doesn't identify the CPU in question. A sequence of "disabled" >> and "enabled" could thus result, without the last one (or in fact any >> one) making clear what the overall state is. I think you want to avoid >> this from the beginning. > > ... Thereby it seems like declaration of g_disabled should be moved outside > vmid_init() function and add a new function which will return g_disabled > value (or just make g_disabled not static and rename to something like > g_vmids_disabled). No, why? While I didn't Cc you on my patch submission, I specifically replied to it with you (alone) on the To: list, just so you can look there first before suggesting (sorry) odd things. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |