-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing DMA_MASK in IOMMU specification leaves application problems in the real operating system #391
Comments
The IOMMU provides the address that was submitted for address translation in the fault record. Software may apply masks as appropriate to the reported addresses .
The addresses presented to the IOMMU are addressses generated by the IP. If the IP cannot generate an address beyond say 16-bits then the IOMMU would not see any addresses requested for translation bits beyond what the IP can generate. There is no requirement to wire most significant bits to all 1s etc. |
Then we may introduce a configurable "DMA_MASK" option into the reference model, ignoring significant bits checking if it resides beyond the configured DMA_MASK so that we can also use it with real SoC verification where such DMA masters will be integrated. |
That would be incorrect thing to do. The IOMMU should prevent misbehaving, misprogrammed and/or malicious devices from accessing memory that they are not authorized to access. Masking the address will in best case hide the misprogramming and in worst case hide the malicious behavior or misbehavior. |
IOMMU silicon developed by forcing the following code observes wrong fault in Linux, preventing such DMA masters to be used:
IMO, we should alter this piece a code a bit to allow an SoC specific DMA_MASK to be configured to prevent wrong fault to be generated for the DMA masters whose address width is less than 39, or less than 48. This is useful when a verification TB is created with IOMMU + DMAC (address width is less than 39/48) + reference model. |
Consider that a DMA master can generate only generate addresses less than 48 bit, say 16-bit wide addresses. Such a DMA master can only generate addresses 0x000000000000 through 0x00000000FFFF legally. When Sv48 is in use, all of these addresses are legal addresses. The range of legal addresses for Sv48 are 0x0000'0000'0000'0000 to 0x0000'7FFF'FFFF'FFFF and 0xFFFF'8000'0000'0000 to 0xFFFF'FFFF'FFFF'FFFF. Could you please give an example of such an address? |
Real issue is seen against a DMAC in an SoC design, whose master can only support maximum 44-bits, DMAC designer then wires higher bits of the DMA addresses to 0s. When we use such DMAC in Linux kernel with IOMMU remapping enabled, following kernel panic can be seen due to IOMMU VA checking logic, preventing a valid DMA transfer to be performed: |
OK, I see. You mean 0x7fff_ffff_x000 shouldn't be a valid address for such kind of DMA device due to its driver doesn't invoke dma_set_mask(44), as such, there is no worry about seeing an input address between 0xFFFF'8000'0000'0000 and 0xFFFF'FFFF'FFFF'FFFF. |
One more concern is related to the IOMMU device capability. What if the IOMMU only accepts 44-bits input address, and the DMA masters are configured to use maximum 44-bits input address as well. |
In the example you posted, software probably created DMA mappings for:
It then programmed the device to copy from one page to another. The device can only generate 44 bit address but it has been asked to DMA to a wider address. The device ends up dropping those upper bits and instead DMAs to the following addresses:
And this leads to a fault - as expected since the DMA is to a address. From the print it seems like its a G-stage fault? This clearly is a programming mistake. A device that can generate only 44-bit addresses should not be given an address wider than 2^44. Further in this case its unclear what the mask in the IOMMU can do - the address has already been masked by the device.
The IOMMU either accepts 64-bit addresses (RV64) or accepts 32-bit addresses (RV32). There is no specification for a IOMMU with 44-bit input address. |
OK, this makes root cause clearer. |
For PAS field, since it reflects the bit width of the DMA masters or the bus interconnects which the IOMMU is connected to. For these SoC components (dma masters/bus interconnects), the higher bits will always be truncated to 0s, the padding rule is not compatible with CPU components' RISC-V virtual addresses. In the OSen, these addresses are also known as bus addresses or dma addresses. Thus PAS actually reflects the bit width of both the input address (could be virtual) and the output address, it actually is not "physical". Shall we use bus address size (BAS) of dma address size (DMA_SIZE) instead of PAS? And if possible, should we put this into device context since the address width capability is DMA master by DMA master. |
The PAS field reflects the physical address space addressable by the IOMMU and not the bit width of DMA masters. |
The PAS is not related to any of this. The PAS in the specification is what the IOMMU can address.
The driver of DMAC1 should only program an address below 2^40 and the driver of DMAC0 an address below 2^44.
If the bus of a DMA master is 40 bit wide then it should not be programmed with an address that is wider than 40 bits. Such device will likely not even have registers to hold an address wider than 40 bits.
Such buses and interconnects are outside the scope of the IOMMU to specify. |
For Sv39, bit-38 = 1 is a valid address as 38 is smaller than 40 or 44, and OS kernel can have a DMA address remapped to bit-38 = 1 region. OSen could probably don't do any special bounce buffering against a direct IO address in case VA size is smaller than DMA size. And bit 39 - cpu bus width (ex, 39 - 63) could be padded with bit-38. |
When mode is configured as Sv39/Sv48/Sv57, the VA must be a 64-bit address. There are only two VA lengths - 64 and 32 - supported by the IOMMU. The addresses where bits 63:39 do not match bit 38 are not valid virtual addresses for Sv39. In general for mode SvN, the address bits 63:N must be equal to bit N-1 for the address to be a valid virtual address. Since the devices in this example are not 64-bit capable, they cannot be programmed with a 64-bit VA. They may be programmed with a PA or a GPA. If a VA needs to be used then the OS has to use Sv48 or Sv57 and map the VA below 2^40/2^44 for such devices. |
OK, we seem on the same page. Using the current Linux kernel driver, we can remove the sign-extension of the VA to limit the DMA addressing. My concern is that RISC-V IOMMU seems to couple too many things together:
It looks IOVA is coupled with VA, with devices' DMA capabilities, with "the moment where user is allocating the DMA buffer and the moment where the device is being configured". Why don't we decouple all of them by letting each DC to configure DMA width and use top bits ignorance instead of validating them as "RISC-V valid virtual addresses"? Using current coupled solution in Linux IOMMU driver, I can still see application problems:
There could always be more issues if we do not solve it right where it happens (there is no user of these address bits in DMA addresses, why shall we force how bus uses them?). And making a change here seem to match the RISC-V architecture philosophy of forward-thinking. |
If we use top bits ignorance in IOMMU, the negative addresses could also be valid for such devices. That makes the entire OS virtual address space DMA addressable. |
When the IOVA is a VA, the rules for VA translation apply. This includes checking for validity of the VA. Such truncated/invalid VAs will not be accepted by x86 IOMMUs. For ARM IOMMUs such truncations may also lead to the wrong TTB being used for translation i.e. get translated by TTB0 instead of TTB1. |
ARM STE contains both TTB0 and TTB1. Shouldn't RISC-V IOMMU take care of kernel space VA? |
Unlike ARM architecture that allows setting up two pages tables - one which is used when upper address bits are 0 and one which is used when upper address bits are 1 - the RISC-V virtual memory system has a single page table which is used to map both forms of addresses. Whether kernel space VAs are mapped with upper address bits set to all 1 or all 0 is a software convention. The software convention in most 64-bit OSs including linux has been to map kernel VA with upper address bits as all 1 i.e. as negative addresses. As noted in the address map I pasted above a 64-bit address space has two distinct ranges of valid virtual addresses - a positive range and a negative range. Separating them is a range of invalid/non-canonical addresses. There is no limitation in using valid addresses from either the negative or positive address sub-ranges. |
When a buffer is allocated in the kernel heap (ex., kmalloc), and later turned into a direct IO buffer passed to the DMA. The DMA could treat this VA as sanitized VA and issue an IO bus RDWR out. While this could make IOMMU to report a failure as BUS might truncate the address, leaving higher bits zeroed. If IOMMU couldn't ignore the higher bits its incoming connector doesn't support, but treating all incoming VA as a whole 64-bit value, how could IOMMU handle this address? |
A device that truncates addresses is not compatible with shared virtual addressing and the iova cannot be a VA. It can be a GPA or a SPA. In an earlier post there was a question about TTB0 and TTB1. Kernel VA with upper bits set are only mapped in TTB1 and when the device truncates its address the iommu in such systems would incorrectly use TTB0. |
If we sign extend the highest bits of the DMA address to 64-bits and compare it with the correct 64-bits virtual address. Let me try to use a table to describe the ambiguity. For simplicity, considering the case that the IOMMU only support Sv39 and Sv48.
There could be implementation variations making different applications possible:
Should this be mentioned as UNSPECIFIED in the specification? |
The IOVA used to configure device DMA may be one of : VA, GPA, or SPA. The only VA widths supported in RISC-V are 64-bit and 32-bit. The GPA widths supported by RISC-V are 34-bit, 41-bit, 50-bit, and 59-bits. The maximum SPA width is platform specific. A device that support VA as IOVA i.e. supports shared virtual addressing must only be programmed with valid virtual addresses for the address translation mode. A 64-bit VA is a valid for mode Sv< n > if bit 63:n == bit n-1. If a device is not capable of generating 64-bit addresses then such devices must not be programmed with a 64-bit VA. A device or some intermediate shim may attempt to make such truncated addresses valid by sign extension. This sign extension is outside the scope of the IOMMU. However, for such sign extension to work, the device must be capable of holding and generating n bits if the address translation mode is Sv< n > i.e., if Sv48 is used then the device must be capable of holding at least 48 bits and if Sv39 is used then it is capable of holding at least 39 bits. |
Agreed, that would be safer and easier.
I was thinking this kind of sign extension can be solely done by SoC. In the end, this turns out to be not working. As SoC can easily pad an invalid VA into a valid VA (NO. 8/9/10/12/13/14/15/16, please refer to NO. 8/12). |
Taking number 8 as an example. The device is only capable of holding 47 bits. This violates the rule: There is no amount of "fixing" that the SoC or the IOMMU can do to make this device compatible with Sv48 as it can only address 2^47 address range out of the 2^48 valid address range. |
In the modern operating systems, DMA addresses are not just physical addresses or virtual addresses, please refer to:
https://github.com/torvalds/linux/blob/master/Documentation/core-api/dma-api-howto.rst
Operating systems treat DMA addresses as "bus addresses", which are likely referring to a subset of a physical address space or a subset of a virtual address space.
In the systems like Linux, there is a DMA_MASK attributes system widely applied or device specific.
When a virtual/physical address is to be converted into a DMA address, operating system simply masks off the the most significant bits of the DMA address, that means, the most significant bits of the DMA address are always 0s.
And there are many ecosystem DMA master IPs which support limited address width, say, 16-bits, 32-bits, or 48-bits. Such IPs do not know whether the programmed DMA addresses are virtual addresses or physical addresses. The integration guide of such silicon won't introduce a design forcing such IPs to be integrated with most significant bits wired to 1s (ex, same as bit 48).
However in IOMMU specification, it requires fault to be generated for virtual addresses whose most significant bits are not compliant to the RISC-V virtual address requirement (e.x., bit 63-48 should be same as bit 48 for Sv48, etc.). When an input address is reported as fault, iotval/iotval2 in the fault queue should be filled in with full 64-bit input address, while actually, software and hardware are only capable of recording masked DMA bus addresses.
This issue causes serious application problems in the operating system. When an IOMMU implementation is implemented to be compliant to the specification w/o DMA_MASK awareness, fault could be required to be reported for non-compliant virtual addresses whose most significant bits are not equal to bit 48. Then we can see wrong fault to be generated in Linux preventing remapped DMA transfers to be submitted from such DMA masters whose DMA_MASK is not 64-bit and wires most significant bits of its output DMA addresses to 0s. A stupid workaround could be to wire those bits to bit 48, then another compliance problem can be seen in the IOMMU fault queue, which could contain a fault report with most significant bits wired to 1s in iotval/iotval2 while iotval/iotval2 should be physical addresses whose most significant bits should be 0s.
IMHO, IOMMU specification should introduce DMA_MASK capability indicating supported input address of the IOMMU device, and a similar DMA_MASK field in the device context indicating supported DMA address width of the DMA master device. Most significant bits beyond min(capabilities.DMA_MASK, DC.DMA_MASK) are allowed to be all zeros (probably should all zeros or all ones to allow design variations). And the virtual address sanity check should be bounded into the affected DMA_MASK range, for the address bits beyond the affected DMA_MASK range, all checks should be ignored or we can just apply a simple check validating if all such bits equal to bit[DMA_MASK] like all other DMA remapping silicon design do. In fault queue, masked DMA addresses should also allowed to be recorded into iotval/iotval2 fields.
The text was updated successfully, but these errors were encountered: