Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing DMA_MASK in IOMMU specification leaves application problems in the real operating system #391

Open
zetalog opened this issue Jul 30, 2024 · 29 comments

Comments

@zetalog
Copy link

zetalog commented Jul 30, 2024

In the modern operating systems, DMA addresses are not just physical addresses or virtual addresses, please refer to:
https://github.com/torvalds/linux/blob/master/Documentation/core-api/dma-api-howto.rst
Operating systems treat DMA addresses as "bus addresses", which are likely referring to a subset of a physical address space or a subset of a virtual address space.
In the systems like Linux, there is a DMA_MASK attributes system widely applied or device specific.
When a virtual/physical address is to be converted into a DMA address, operating system simply masks off the the most significant bits of the DMA address, that means, the most significant bits of the DMA address are always 0s.

And there are many ecosystem DMA master IPs which support limited address width, say, 16-bits, 32-bits, or 48-bits. Such IPs do not know whether the programmed DMA addresses are virtual addresses or physical addresses. The integration guide of such silicon won't introduce a design forcing such IPs to be integrated with most significant bits wired to 1s (ex, same as bit 48).

However in IOMMU specification, it requires fault to be generated for virtual addresses whose most significant bits are not compliant to the RISC-V virtual address requirement (e.x., bit 63-48 should be same as bit 48 for Sv48, etc.). When an input address is reported as fault, iotval/iotval2 in the fault queue should be filled in with full 64-bit input address, while actually, software and hardware are only capable of recording masked DMA bus addresses.

This issue causes serious application problems in the operating system. When an IOMMU implementation is implemented to be compliant to the specification w/o DMA_MASK awareness, fault could be required to be reported for non-compliant virtual addresses whose most significant bits are not equal to bit 48. Then we can see wrong fault to be generated in Linux preventing remapped DMA transfers to be submitted from such DMA masters whose DMA_MASK is not 64-bit and wires most significant bits of its output DMA addresses to 0s. A stupid workaround could be to wire those bits to bit 48, then another compliance problem can be seen in the IOMMU fault queue, which could contain a fault report with most significant bits wired to 1s in iotval/iotval2 while iotval/iotval2 should be physical addresses whose most significant bits should be 0s.

IMHO, IOMMU specification should introduce DMA_MASK capability indicating supported input address of the IOMMU device, and a similar DMA_MASK field in the device context indicating supported DMA address width of the DMA master device. Most significant bits beyond min(capabilities.DMA_MASK, DC.DMA_MASK) are allowed to be all zeros (probably should all zeros or all ones to allow design variations). And the virtual address sanity check should be bounded into the affected DMA_MASK range, for the address bits beyond the affected DMA_MASK range, all checks should be ignored or we can just apply a simple check validating if all such bits equal to bit[DMA_MASK] like all other DMA remapping silicon design do. In fault queue, masked DMA addresses should also allowed to be recorded into iotval/iotval2 fields.

@zetalog zetalog changed the title Missing DMA_MASK in IOMMU specification and input addresses are used in the wrong way Missing DMA_MASK in IOMMU specification leaves application problems in real operating system Jul 30, 2024
@zetalog zetalog changed the title Missing DMA_MASK in IOMMU specification leaves application problems in real operating system Missing DMA_MASK in IOMMU specification leaves application problems in the real operating system Jul 30, 2024
@ved-rivos
Copy link
Collaborator

ved-rivos commented Jul 30, 2024

The IOMMU provides the address that was submitted for address translation in the fault record. Software may apply masks as appropriate to the reported addresses .

The integration guide of such silicon won't introduce a design forcing such IPs to be integrated with most significant bits wired to 1s (ex, same as bit 48).

The addresses presented to the IOMMU are addressses generated by the IP. If the IP cannot generate an address beyond say 16-bits then the IOMMU would not see any addresses requested for translation bits beyond what the IP can generate. There is no requirement to wire most significant bits to all 1s etc.

@zetalog
Copy link
Author

zetalog commented Jul 30, 2024

The addresses presented to the IOMMU are addressses generated by the IP. If the IP cannot generate an address beyond say 16-bits then the IOMMU would not see any addresses requested for translation bits beyond what the IP can generate. There is no requirement to wire most significant bits to all 1s etc.

Then we may introduce a configurable "DMA_MASK" option into the reference model, ignoring significant bits checking if it resides beyond the configured DMA_MASK so that we can also use it with real SoC verification where such DMA masters will be integrated.

@ved-rivos
Copy link
Collaborator

Then we may introduce a configurable "DMA_MASK" option into the reference model, ignoring significant bits checking if it resides beyond the configured DMA_MASK.

That would be incorrect thing to do. The IOMMU should prevent misbehaving, misprogrammed and/or malicious devices from accessing memory that they are not authorized to access. Masking the address will in best case hide the misprogramming and in worst case hide the malicious behavior or misbehavior.

@zetalog
Copy link
Author

zetalog commented Jul 30, 2024

IOMMU silicon developed by forcing the following code observes wrong fault in Linux, preventing such DMA masters to be used:

    // 1. Let a be satp.ppn × PAGESIZE, and let i = LEVELS − 1. PAGESIZE is 2^12. (For Sv32,
    //    LEVELS=2, For Sv39 LEVELS=3, For Sv48 LEVELS=4, For Sv57 LEVELS=5.) The satp register
    //    must be active, i.e., the effective privilege mode must be S-mode or U-mode.
    if ( iosatp.MODE == IOSATP_Sv32 && SXL == 1 ) {
       ...
        // When `SXL` is 1, the following rules apply:
        // * If the S/VS-stage page table is not `Bare` then a page fault corresponding to
        //   the original access type occurs if the `IOVA` has bits set beyond bit 31.
        mask = (1UL << (64 - 32)) - 1;
        masked_upper_bits = (iova >> 32) & mask;
    }
    if ( iosatp.MODE == IOSATP_Sv39 && SXL == 0 ) {
        ...
        mask = (1UL << (64 - 38)) - 1;
        masked_upper_bits = (iova >> 38) & mask;
    }
    if ( iosatp.MODE == IOSATP_Sv48 && SXL == 0 ) {
        ...
        mask = (1UL << (64 - 47)) - 1;
        masked_upper_bits = (iova >> 47) & mask;
    }
    if ( iosatp.MODE == IOSATP_Sv57 && SXL == 0 ) {
        ...
        mask = (1UL << (64 - 56)) - 1;
        masked_upper_bits = (iova >> 56) & mask;
    }
    // Instruction fetch addresses and load and store effective addresses,
    // which are 64 bits, must have bits 63:<VASIZE> all equal to bit
    // (VASIZE-1), or else a page-fault exception will occur - for SXL=0
    // Do the address is canonical check - for SXL=0
    // For SXL = 1 check bits 63:32 are all 0
    if ( (masked_upper_bits != 0 && masked_upper_bits != mask && SXL == 0) ||
         (masked_upper_bits != 0 && SXL == 1) ) goto page_fault;

IMO, we should alter this piece a code a bit to allow an SoC specific DMA_MASK to be configured to prevent wrong fault to be generated for the DMA masters whose address width is less than 39, or less than 48.

This is useful when a verification TB is created with IOMMU + DMAC (address width is less than 39/48) + reference model.

@ved-rivos
Copy link
Collaborator

IMO, we should alter this piece a code a bit to allow an SoC specific DMA_MASK to be configured to prevent wrong fault to be generated for the DMA masters whose address width is less than 39, or less than 48.

Consider that a DMA master can generate only generate addresses less than 48 bit, say 16-bit wide addresses. Such a DMA master can only generate addresses 0x000000000000 through 0x00000000FFFF legally. When Sv48 is in use, all of these addresses are legal addresses. The range of legal addresses for Sv48 are 0x0000'0000'0000'0000 to 0x0000'7FFF'FFFF'FFFF and 0xFFFF'8000'0000'0000 to 0xFFFF'FFFF'FFFF'FFFF. Could you please give an example of such an address?

@zetalog
Copy link
Author

zetalog commented Jul 30, 2024

Real issue is seen against a DMAC in an SoC design, whose master can only support maximum 44-bits, DMAC designer then wires higher bits of the DMA addresses to 0s.

When we use such DMAC in Linux kernel with IOMMU remapping enabled, following kernel panic can be seen due to IOMMU VA checking logic, preventing a valid DMA transfer to be performed:

image

@zetalog
Copy link
Author

zetalog commented Jul 30, 2024

OK, I see. You mean 0x7fff_ffff_x000 shouldn't be a valid address for such kind of DMA device due to its driver doesn't invoke dma_set_mask(44), as such, there is no worry about seeing an input address between 0xFFFF'8000'0000'0000 and 0xFFFF'FFFF'FFFF'FFFF.
Then as a conclusion, when such an test bench is created, input address stimuli should be controlled in the TB side rather than in the IOMMU reference model.

@zetalog
Copy link
Author

zetalog commented Jul 30, 2024

One more concern is related to the IOMMU device capability. What if the IOMMU only accepts 44-bits input address, and the DMA masters are configured to use maximum 44-bits input address as well.
In such a case, shall we introduce such kind of capability in the IOMMU reference model?

@ved-rivos
Copy link
Collaborator

ved-rivos commented Jul 30, 2024

In the example you posted, software probably created DMA mappings for:

  • 0x7fffffffc000
  • 0x7fffffff8000

It then programmed the device to copy from one page to another. The device can only generate 44 bit address but it has been asked to DMA to a wider address. The device ends up dropping those upper bits and instead DMAs to the following addresses:

  • 0x0fffffffc000
  • 0x0fffffff8000

And this leads to a fault - as expected since the DMA is to a address. From the print it seems like its a G-stage fault?

This clearly is a programming mistake. A device that can generate only 44-bit addresses should not be given an address wider than 2^44. Further in this case its unclear what the mask in the IOMMU can do - the address has already been masked by the device.

One more concern is related to the IOMMU device capability. What if the IOMMU only accepts 44-bits input address, and the DMA masters are configured to use maximum 44-bits input address as well.

The IOMMU either accepts 64-bit addresses (RV64) or accepts 32-bit addresses (RV32). There is no specification for a IOMMU with 44-bit input address.

@zetalog
Copy link
Author

zetalog commented Jul 31, 2024

OK, this makes root cause clearer.
Thanks and best regards.

@zetalog zetalog closed this as completed Jul 31, 2024
@zetalog
Copy link
Author

zetalog commented Dec 6, 2024

For PAS field, since it reflects the bit width of the DMA masters or the bus interconnects which the IOMMU is connected to. For these SoC components (dma masters/bus interconnects), the higher bits will always be truncated to 0s, the padding rule is not compatible with CPU components' RISC-V virtual addresses. In the OSen, these addresses are also known as bus addresses or dma addresses.

Thus PAS actually reflects the bit width of both the input address (could be virtual) and the output address, it actually is not "physical". Shall we use bus address size (BAS) of dma address size (DMA_SIZE) instead of PAS? And if possible, should we put this into device context since the address width capability is DMA master by DMA master.

@zetalog zetalog reopened this Dec 6, 2024
@ved-rivos
Copy link
Collaborator

For PAS field, since it reflects the bit width of the DMA masters or the bus interconnects which the IOMMU is connected to.

The PAS field reflects the physical address space addressable by the IOMMU and not the bit width of DMA masters.

@zetalog
Copy link
Author

zetalog commented Dec 9, 2024

Not sure if we are synchronized on the same page. Let me describe the scenarios I'm considering of.

Considering an Soc system, DMA is capable of addressing 40-bit DMA addresses. After being translated by the IOMMU, the 40-bit is translated into 44-bit bus addresses to be connected to the bus interconnects (PAS is 44).
DMA addressing
In OSen, when the DMA master is being programmed using a non-bare DMA remapping facility (IOMMU), the 40-bit dma_mask will be applied to the virtual addresses.

Considering a scenario, that SoC designs want to combine multiple DMA masters into a sub IO networking with bus interconnects to save the number of the required SIDs:
IOMMU-applications

In the first scenario, how could IOMMU to limit the width of the input addresses? I could only see PAS in the IOMMU specification.

For the second scenario, when the DMA masters issue the DMA requests with the VAs whose higher bits (bit 38 or bit 47 is one concerning with Sv39 or Sv48) could be 1, how could the VA bus interconnects restore the higher bits if the DMAC0 / DMAC1 truncate the DMA addresses out?
Since there is no input address width configurable in IOMMU, should the VA bus restores the address back to 64-bit (bit 63-44 for DMAC0 and bit 63-40 for DMAC1)?
Or if we only need to restore the VA address back to 44-bit, how the VA bus interconnect pads the higher bit 40-43 for the DMAC1?

@ved-rivos
Copy link
Collaborator

Considering an Soc system, DMA is capable of addressing 40-bit DMA addresses. After being translated by the IOMMU, the 40-bit is translated into 44-bit bus addresses to be connected to the bus interconnects (PAS is 44).

The PAS is not related to any of this. The PAS in the specification is what the IOMMU can address.

In the first scenario, how could IOMMU to limit the width of the input addresses? I could only see PAS in the IOMMU specification.

The driver of DMAC1 should only program an address below 2^40 and the driver of DMAC0 an address below 2^44.

For the second scenario, when the DMA masters issue the DMA requests with the VAs whose higher bits (bit 38 or bit 47 is one concerning with Sv39 or Sv48) could be 1, how could the VA bus interconnects restore the higher bits if the DMAC0 / DMAC1 truncate the DMA addresses out?

If the bus of a DMA master is 40 bit wide then it should not be programmed with an address that is wider than 40 bits. Such device will likely not even have registers to hold an address wider than 40 bits.

Since there is no input address width configurable in IOMMU, should the VA bus restores the address back to 64-bit (bit 63-44 for DMAC0 and bit 63-40 for DMAC1)? Or if we only need to restore the VA address back to 44-bit, how the VA bus interconnect pads the higher bit 40-43 for the DMAC1?

Such buses and interconnects are outside the scope of the IOMMU to specify.

@zetalog
Copy link
Author

zetalog commented Dec 9, 2024

The driver of DMAC1 should only program an address below 2^40 and the driver of DMAC0 an address below 2^44.

For Sv39, bit-38 = 1 is a valid address as 38 is smaller than 40 or 44, and OS kernel can have a DMA address remapped to bit-38 = 1 region. OSen could probably don't do any special bounce buffering against a direct IO address in case VA size is smaller than DMA size. And bit 39 - cpu bus width (ex, 39 - 63) could be padded with bit-38.
When this address is being programmed to the DMA master, the higher bits (bit 40~63 for DMAC1 and bit 44-63 for DMAC0) could be truncated by the DMA controller.
Then the IOMMU will get an address with bit 38-39 padded with 1 by OS and 40-63 truncated to 0 by DMAC1, an address with bit 38-43 padded with 1 by OS and 44-63 truncated to 0 by DMAC0. From IOMMU's point of view, is such kind of VA a valid Sv39 address?

@ved-rivos
Copy link
Collaborator

ved-rivos commented Dec 9, 2024

Then the IOMMU will get an address with bit 38-39 padded with 1 by OS and 40-63 truncated to 0 by DMAC1, an address with bit 38-43 padded with 1 by OS and 44-63 truncated to 0 by DMAC0. From IOMMU's point of view, is such kind of VA a valid Sv39 address?

When mode is configured as Sv39/Sv48/Sv57, the VA must be a 64-bit address. There are only two VA lengths - 64 and 32 - supported by the IOMMU. The addresses where bits 63:39 do not match bit 38 are not valid virtual addresses for Sv39. In general for mode SvN, the address bits 63:N must be equal to bit N-1 for the address to be a valid virtual address.

Since the devices in this example are not 64-bit capable, they cannot be programmed with a 64-bit VA. They may be programmed with a PA or a GPA. If a VA needs to be used then the OS has to use Sv48 or Sv57 and map the VA below 2^40/2^44 for such devices.

@zetalog
Copy link
Author

zetalog commented Dec 10, 2024

Hi, I drew a bug scenario chart below, it's based on OS DMA allocation scenario (dma_alloc_coherent() comparing to the dma_mmap_coherent() scenario):
DMA application
In the step 1, 2, 3, which one could be done in another way to make the VA valid?
And I'll get back to you after correctly organizing my words to reply to your concerns.
Thanks for the patience.

@ved-rivos
Copy link
Collaborator

ved-rivos commented Dec 10, 2024

For such a device, the DMA width should be limited to 38 bits and not 40 bits. Alternatively, one can use Sv48 mode.

With Sv39, 39 bits of the VA are translated (12 bits of page offset, and 9x3 = 27 bits used as page table indices).

For a device that truncates addresses, any address negative address such as 0xFFFFFFC000100000 is an invalid. Since these devices cannot preserve the upper bits and instead replace them with 0, only positive addresses can be used with such devices.

For Sv39, the range of legal addresses is as follows:
image

For such devices that replace the upper bits with 0, the range of valid addresses is - 0x0000000000000000 to 0x00003FFFFFFFFFFF. Specifically the address 0x000000C000100000 is a non-canonical and illegal address.

@zetalog
Copy link
Author

zetalog commented Dec 11, 2024

OK, we seem on the same page. Using the current Linux kernel driver, we can remove the sign-extension of the VA to limit the DMA addressing.

My concern is that RISC-V IOMMU seems to couple too many things together:

  1. IOMMU input address width is related to the capability of the DMA masters and the bus interconnects. While currently the width is determined by the highest SV mechanism that IOMMU can support. So it's always limited to (VA_bits - 1).
  2. When a device is configuring its translation mode, it should always use the highest SV mechanism that IOMMU supports to avoid messing up the wrong limited dma_mask. This is because we are using input address width from a "capabilities" rather than being able to "fctl" it.
  3. When OS kernel shares virtual address space to the devices, it requires the virtual addressing shared should use the same translation mechanism as the highest SV mechanism that IOMMU can support.
  4. DMA masters and bus interconnects are not able to fully utilize their addressing capability when it is working with RISC-V IOMMU. Even worse, it can only address a half of the entire VA addressing space. This makes the entire kernel space not DMA direct addressable.

It looks IOVA is coupled with VA, with devices' DMA capabilities, with "the moment where user is allocating the DMA buffer and the moment where the device is being configured".

Why don't we decouple all of them by letting each DC to configure DMA width and use top bits ignorance instead of validating them as "RISC-V valid virtual addresses"?

Using current coupled solution in Linux IOMMU driver, I can still see application problems:

  1. SVA translation mode differs from the IOMMU highest supported translation mechanism.
  2. When an SoC design embeds SoC specific attributes in the higher bits which should be unused by the bus/DMA, IOMMU fails.
  3. DMA masters can hardly utilize its entire addressing capability.
  4. Probably more that we haven't seen until now.

There could always be more issues if we do not solve it right where it happens (there is no user of these address bits in DMA addresses, why shall we force how bus uses them?). And making a change here seem to match the RISC-V architecture philosophy of forward-thinking.

@zetalog
Copy link
Author

zetalog commented Dec 11, 2024

For a device that truncates addresses, any address negative address such as 0xFFFFFFC000100000 is an invalid. Since these devices cannot preserve the upper bits and instead replace them with 0, only positive addresses can be used with such devices.

If we use top bits ignorance in IOMMU, the negative addresses could also be valid for such devices. That makes the entire OS virtual address space DMA addressable.

@ved-rivos
Copy link
Collaborator

ved-rivos commented Dec 12, 2024

When the IOVA is a VA, the rules for VA translation apply. This includes checking for validity of the VA.

Such truncated/invalid VAs will not be accepted by x86 IOMMUs. For ARM IOMMUs such truncations may also lead to the wrong TTB being used for translation i.e. get translated by TTB0 instead of TTB1.

@zetalog
Copy link
Author

zetalog commented Dec 17, 2024

For ARM IOMMUs such truncations may also lead to the wrong TTB being used for translation i.e. get translated by TTB0 instead of TTB1.

ARM STE contains both TTB0 and TTB1. Shouldn't RISC-V IOMMU take care of kernel space VA?

@ved-rivos
Copy link
Collaborator

ARM STE contains both TTB0 and TTB1. Shouldn't RISC-V IOMMU take care of kernel space VA?

Unlike ARM architecture that allows setting up two pages tables - one which is used when upper address bits are 0 and one which is used when upper address bits are 1 - the RISC-V virtual memory system has a single page table which is used to map both forms of addresses. Whether kernel space VAs are mapped with upper address bits set to all 1 or all 0 is a software convention. The software convention in most 64-bit OSs including linux has been to map kernel VA with upper address bits as all 1 i.e. as negative addresses. As noted in the address map I pasted above a 64-bit address space has two distinct ranges of valid virtual addresses - a positive range and a negative range. Separating them is a range of invalid/non-canonical addresses. There is no limitation in using valid addresses from either the negative or positive address sub-ranges.

@zetalog
Copy link
Author

zetalog commented Dec 20, 2024

There is no limitation in using valid addresses from either the negative or positive address sub-ranges.

When a buffer is allocated in the kernel heap (ex., kmalloc), and later turned into a direct IO buffer passed to the DMA. The DMA could treat this VA as sanitized VA and issue an IO bus RDWR out. While this could make IOMMU to report a failure as BUS might truncate the address, leaving higher bits zeroed. If IOMMU couldn't ignore the higher bits its incoming connector doesn't support, but treating all incoming VA as a whole 64-bit value, how could IOMMU handle this address?
How this VA can be correctly translated by IOMMU without any performance downgrade? IMO, the fastest way is to ignore the higher bits where no one cares them in the entire system.

@ved-rivos
Copy link
Collaborator

A device that truncates addresses is not compatible with shared virtual addressing and the iova cannot be a VA. It can be a GPA or a SPA. In an earlier post there was a question about TTB0 and TTB1. Kernel VA with upper bits set are only mapped in TTB1 and when the device truncates its address the iommu in such systems would incorrectly use TTB0.

@zetalog
Copy link
Author

zetalog commented Dec 25, 2024

If we sign extend the highest bits of the DMA address to 64-bits and compare it with the correct 64-bits virtual address. Let me try to use a table to describe the ambiguity. For simplicity, considering the case that the IOMMU only support Sv39 and Sv48.

NO. DMA address size Translation mode Negative address Extended bit Fault Descriptions
1 >= 48 Sv39 No VA[38] No 0x0000_0000_0000_0000-0x0000_003F_FFFF_FFFF is valid
2 >= 48 Sv39 Yes VA[38] No 0xFFFF_FFC0_0000_0000-0xFFFF_FFFF_FFFF_FFFF is valid
3 >= 48 Sv48 No VA[47] No 0x0000_0000_0000_0000-0x0000_7FFF_FFFF_FFFF is valid
4 >= 48 Sv48 Yes VA[47] No 0xFFFF_8000_0000_0000-0xFFFF_FFFF_FFFF_FFFF is valid
5 < 48 && >= 39 Sv39 No VA[38] No 0x0000_0000_0000_0000-0x0000_003F_FFFF_FFFF is valid
6 < 48 && >= 39 Sv39 Yes VA[38] No 0xFFFF_FFC0_0000_0000-0xFFFF_FFFF_FFFF_FFFF is valid
7 = 47 Sv48 No VA[47]=0 No 0x0000_0000_0000_0000-0x0000_7FFF_FFFF_FFFF is valid
8 = 47 Sv48 Yes VA[47]! No 0xFFFF_8000_0000_0000-0xFFFF_BFFF_FFFF_FFFF is invalid
9 < 47 && >= 39 Sv48 No VA[47]! Yes (1<<(DMA-1)) -0x0000_7FFF_FFFF_FFFF is invalid
10 < 47 && >= 39 Sv48 Yes VA[47]! Yes 0xFFFF_8000_0000_0000-(0-(1<<(DMA-1))-1) is invalid
11 = 38 Sv39 No VA[38]=0 No 0x0000_0000_0000_0000-0x0000_003F_FFFF_FFFF is valid
12 = 38 Sv39 Yes VA[38]! Yes 0xFFFF_FFC0_0000_0000-0xFFFF_FFDF_FFFF_FFFF is invalid
13 < 38 Sv39 No VA[38]! Yes (1<<(DMA-1)) -0x0000_003F_FFFF_FFFF is invalid
14 < 38 Sv39 Yes VA[38]! Yes 0xFFFF_FFC0_0000_0000-(0-(1<<(DMA-1))-1) is invalid
15 < 39 Sv48 No VA[47]! Yes (1<<(DMA-1)) -0x0000_7FFF_FFFF_FFFF is invalid
16 < 39 Sv48 Yes VA[47]! Yes 0xFFFF_8000_0000_0000-(0-(1<<(DMA-1))-1) is invalid

There could be implementation variations making different applications possible:

  1. Is NO. 2/4 supported by IOMMU?
  2. Is NO. 6 supported by IOMMU?
  3. Is NO. 7/11 supported by IOMMU?

Should this be mentioned as UNSPECIFIED in the specification?

@zetalog zetalog closed this as completed Dec 25, 2024
@zetalog zetalog reopened this Dec 25, 2024
@ved-rivos
Copy link
Collaborator

ved-rivos commented Dec 25, 2024

The IOVA used to configure device DMA may be one of : VA, GPA, or SPA. The only VA widths supported in RISC-V are 64-bit and 32-bit. The GPA widths supported by RISC-V are 34-bit, 41-bit, 50-bit, and 59-bits. The maximum SPA width is platform specific. A device that support VA as IOVA i.e. supports shared virtual addressing must only be programmed with valid virtual addresses for the address translation mode. A 64-bit VA is a valid for mode Sv< n > if bit 63:n == bit n-1.

If a device is not capable of generating 64-bit addresses then such devices must not be programmed with a 64-bit VA.

A device or some intermediate shim may attempt to make such truncated addresses valid by sign extension. This sign extension is outside the scope of the IOMMU. However, for such sign extension to work, the device must be capable of holding and generating n bits if the address translation mode is Sv< n > i.e., if Sv48 is used then the device must be capable of holding at least 48 bits and if Sv39 is used then it is capable of holding at least 39 bits.

@zetalog
Copy link
Author

zetalog commented Dec 26, 2024

However, for such sign extension to work, the device must be capable of holding and generating n bits if the address translation mode is Sv< n > i.e., if Sv48 is used then the device must be capable of holding at least 48 bits and if Sv39 is used then it is capable of holding at least 39 bits.

Agreed, that would be safer and easier.

A device or some intermediate shim may attempt to make such truncated addresses valid by sign extension. This sign extension is outside the scope of the IOMMU.

I was thinking this kind of sign extension can be solely done by SoC. In the end, this turns out to be not working. As SoC can easily pad an invalid VA into a valid VA (NO. 8/9/10/12/13/14/15/16, please refer to NO. 8/12).
So if someone attempts to make such truncated addresses valid by sign extension, it can only be done by IOMMU which is the only one in the system knowing the correct padding bit (the bit to be used to sign extending) via translation modes.

@ved-rivos
Copy link
Collaborator

As SoC can easily pad an invalid VA into a valid VA (NO. 8/9/10/12/13/14/15/16, please refer to NO. 8/12).

Taking number 8 as an example. The device is only capable of holding 47 bits. This violates the rule:
"The device must be capable of holding and generating n bits if the address translation mode is Sv< n >"

There is no amount of "fixing" that the SoC or the IOMMU can do to make this device compatible with Sv48 as it can only address 2^47 address range out of the 2^48 valid address range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants