-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VMware vSphere/ESXi use #82
Comments
ESXi supports only local 4Kn SAS and SATA HDDs. The key word above is "local", additionally: "ESXi detects and registers the [local] 4Kn devices and automatically emulates them as 512e." |
I have a similar ticket but in the opposite direction and for RapidDisk-Cache: #59. |
@pkoutoupis you've done a REALLY fantastic job with this project! The more I learn and read the wiki I see massive potential to check off the big three items for: A "white box" NVMe over RDMA/TCP SAN, with caching, and a REST API . |
I'm curious @pkoutoupis and happy to help in any way I can getting 512e block size working and whatever else is required to allow use with VMware ESXi. |
Hmmm. I am actually kind of surprised that blkdev is returning 4K when only the physical block size is 4K and the logical is 512 bytes:
BYTES_PER_SECTOR is 512 bytes. I didn't think I made it 4K and as you see from the code about, the logical is not.
Strangely, even when I change the physical block size to match that of the logical...
Hmmmm. Not sure what is going on here. Even when I change the minimum I/O size from 4K to 512....
EDIT: What I would recommend is to set up another Linux node to see if it detects/imports the NQN. Let us make sure that it is being exported correctly in the first place. |
What I find interesting is....
It reports the logical sector size as 512 but block size but again, the "block size" is 4K. This util is supposed to send ioctls to the block device:
^ from the blockdev source code. What I find strange is that I default everything to 512 bytes:
AND when I cal blockdev with --getbsz (or anything else), my module does not register the ioctl. Oh, I just stumbled on this interesting piece of info which can explain this: https://bugzilla.redhat.com/show_bug.cgi?id=1684078 So, I do not think that our problem is 512e related. Again, I would see if we can import it in another Linux node. And I would compare the results from the rapiddisk utility to my article published a few years ago: https://www.linuxjournal.com/content/data-flash-part-iii-nvme-over-fabrics-using-tcp |
Yeah, I just hacked up my code to see if the ram drive sees the ioctls and it doesn't. The kernel is returning those values. Strange. |
@pkoutoupis good inclination on testing from outside ESXi:
|
@pkoutoupis nevermind, I forgot I had rebooted the target host and that wiped the rapiddisk rd0 cache and the mapping to the md array and the nvme over TCP target! BTW how do I persist these past a reboot?
|
For example, I have to do this on every boot on the target VM:
Soooo, Ubuntu 20.04 VM Target and Ubuntu 20.04 VM Initiator seems to work great, but ESXi is still not seeing any target as the initiator... must find log to tell me what's going on... |
@pkoutoupis please see: https://communities.vmware.com/t5/ESXi-Discussions/NVMEof-Datastore-Issues/td-p/2301440 "Issue was solved by Storage vendor. Released a new firmware, downgrade 4k to 512 volume block size supported by VMware." And I found the log:
Not exactly sure what "Failed to get source vmknic for NVMe/TCP traffic on vmnic4: Not found" means... also no idea if those |
Yes, for now, it is a manual process on each reboot. I have not added an autoload feature. In fact, I am currently working on a "RapidDisk OS" that will incorporate that autoload functionality. For now, using the current RapidDisk suite, users are limited to either manually running it or by using an rc.local (or other) script to have the same routines run during boot time. I did notice something above which may be an error (I could be wrong)...
You should instead be exporting Remember, RapidDisk volumes can be used as independent RAM drives (i.e. /dev/rd0, /dev/rd1, et al) or as a mapping of two volumes: the ram drive as the cache and the backing store (or original volume, in your case /dev/md127). To access the mapping, you need to do so from a virtual block device such as
But it is reassuring that your second VM is seeing the NQN. Something like this will connect the drive (you may need to modify the params) to the local VM and it will then appear in the
Anyway, I am looking at the VMware log output. I am confused by the subnqn. Why are we seeing |
I'd love to see an OS like ESOS for NVMe!!!!!!!!!! |
@pkoutoupis More hacking on this tonight:
ESXi Logs (Auto Discovery):
And Manual input (Don't be fooled by the
I'm puzzled by What's strange is how instantly it errors out. I don't have really any experience with NVMe/TCP, but I surmise that it would potentially use Keep Alive Timeout (KATO) of |
See here: I wonder if we are seeing through 512e and seeing the 4Kn... it's hard to tell what is 512 native (512n), 4K emulated to 512e, or 4K native (4Kn)... |
I worked on this a bit more tonight. On previous trials, I was presenting virtual NVMe hard disks (in md raid 1) over RapidDisk NVMe/TCP back to the same ESXi that was running the VM I was presenting storage from. Tonight I thought to try a NON NVMe virtual disk from the VM and use a different different disk controller (LSI Logic Parallel instead of NVMe) thinking that maybe the virtual NVMe controller or virtual drives where influencing the 4096 block size. I knew this was a stretch though because I still have two virtual drives in an md raid 1 array and that is what I am fronting with RapidDisk. No difference in ESXi, same discovery error. 😞 |
I found this in regard to iSCSI with SPDK (Intel backed project for NVMe target I believe): |
@pkoutoupis any github for your "RapidDisk OS"? I'd love to contribute anything I can. There are virtually no projects out there that I've been able to find that combine the caching and NVMe over RDMA/TCP that you've accomplished with RapidDisk! |
@singlecheeze I appreciate your excitement. I am not ready to share that project publicly yet. I am hoping to do so within the next 6 months. There is a bit of polishing that needs to happen first. Trust me when I say, you will be one of the first I reach out to when I do. And when that does happen, I hope to have this issue with vSphere sorted out as it needs to be a non-issue when the OS does become more available. |
@pkoutoupis 🎉🎉🎉 |
From a lead here: https://communities.vmware.com/t5/ESXi-Discussions/NVMEof-Datastore-Issues/td-p/2301440 |
Rapiddisk nqn comes across as:
|
|
All wonderful investigative work. I am caught up but am unclear on what we are suspecting. Are we doubting the NQN format from the rapiddisk application? |
@pkoutoupis too hard to say at the moment... vmware documentation online is very little so having to try a bunch of stuff. Going to work on a bit more today. |
This is the same exact issue: https://mymellanox.force.com/mellanoxcommunity/s/question/0D51T00007SkkB7SAJ/can-you-do-rocev2-without-a-switch |
|
|
Let me know if you need a patch for the NQN. Or anything else. Again, I appreciate you putting all this work into this. You have been extremely helpful. |
@pkoutoupis thank you very much. I'm using this issue to document some troubleshooting as there is virtually nothing online as per my google fu... hope you don't mind :) |
Tried this claim rule too, which didn't work:
EUI (extended unique identifier): Looks like most storage vendors use SPDK and exposes the EUI which I'm wondering if VMware has to have vs a UUID (I have not seen how to use in-kernel nvme EUI or if it is even available):
|
Oddly with the above presented NVMeoTCP disk visable, no custom claim rules are needed (These are all default claim rules):
A helpful blog post on nvme related claim rules: |
@pkoutoupis if you haven't seen this, please take a look: https://open-cas.github.io/ The team at intel doing Open CAS (OCF) that looks very close to rapiddisk potentially and bundles SPDK |
Interesting. I had never heard of this project and you are correct, it is oddly similar. I am not sure how I feel about that. Anyway, I should probably add diagrams into my project README. Maybe that will avoid some of the questions I keep getting asked about the project. |
#82 (comment)
And what happens if you remove the wildcard from after |
What if you use a wildcard for the vendor and Linux for the model without a wildcard? |
My Google Fu must be on 🔥 this morning (though from 2 years ago): https://lore.kernel.org/all/[email protected]/T/
Still same result:
|
The plot thickens with SPDK (Just using one RAM device/drive, not an MD array, even though this is all for test and is actually a VMDK sitting on an iSCSI data store in a VM that I'm just re-presenting storage back to the ESXi host that VM lives on):
Results in:
ESXi Sees:
|
Hmmmm. Remember your comment from earlier?
Notice how the I wonder if THIS is the reason. Anyway, there was an attempt to implement this in the kernel but it was met with a lot of resistance: http://lists.infradead.org/pipermail/linux-nvme/2021-January/021783.html |
What version of ESXi are you using? According to the Lightbits website, version 7.0U3 has been certified with their Lightbits OS using the same driver stack (they developed the NVMe TCP driver). |
7.0U3d (Latest) I've talked to them recently and think they are using SPDK |
If that is indeed the case, then it won't be possible to have the hypervisor support the exported volumes. It would only be possible to export them directly to the VMs that support NVMe TCP without fused commands. We may need to rethink this whole strategy. Maybe for VMware, just limit things to writing a Python wrapper script that lives in the scripts subdirectory and wraps around SPDK, while leaving the current implementation in place. |
So, I asked the question and one of the co-founders answered: https://koutoupis.com/2022/04/22/vmware-lightbits-labs-and-nvme-over-tcp/#comment-56 |
Well done!!! More adventures in SPDK land @pkoutoupis |
So, how did we want to resolve this ticket? Some thoughts came to my mind:
|
Let me experiment with using SPDK with RapidDisk, happy to create a pull request if I can get something pulled together |
@singlecheeze Hello. I am going through the open tickets and wanted to follow up on this. |
@pkoutoupis I tried to do some benchmarking with SPDK but it was very unstable: https://gist.github.com/singlecheeze/0bbc2c29a5b6670887127b93f7b71e3f Additionally with the Broadcom announcement and VMware, I'm moving away from VMware and looking at other hypervisors, namely XCP-ng which is Debian based I believe so maybe we could do an NVMe over TCP plug-in for that. Any any case I think this one is good to close. |
Thank you very much for the update. |
It would be wonderful 🎉 to have 512e block size as an option for rapiddisk devices.
They appear to only be 4k:
This is based on the documentation found here, and specifically that VMware ESXi requires 512e block size devices
Here is an example config:
And the error I am seeing in vSphere (Most fields do not require an input as they will default to specifc settings, and I've tried to change just about everything and get the same error, "Error during discover controllers"):
My hunch is that it is successfully connecting to the NVMe/TCP target (details above), but is seeing the 4k versus 512 block and then throws an error. Though I have been unable to find a log that says specifically what the problem is so far.
The text was updated successfully, but these errors were encountered: