You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You probably have received an email on January 11 that says that a significant part of Jean-Zay is going to get partially shut-down on February 5 2024, in order to install new GPU nodes.
For now, here is the summary of my understanding of the situation, I'll update it when I get the time, feel free to comment below (or edit if you have the rights).
Summary of the situation
45% of the V100 GPUs are going to be stopped on February 5 and will be removed from the cluster to make room for the new H100 nodes
timing is not clear, but the hope is that the new H100 nodes may be available for users beginning of September. Take this with a huge grain of salt, this is completely from informal feed-back, and no communication has been done on this from IDRIS, who operates Jean-Zay
Adastra work-around
If you feel motivated/proactive/curious/bored, one possible work-around would be to apply to Adastra, I created a pad so that anyone can add relevant information there. At the time of writing (mid-January), this is pretty much anyone's guess whether this will be worth it or not for your particular use case ...
Here is a few things to know about Adastra:
Adastra has AMD GPUs (not NVIDIA) but I suspect most people around us don't care since they use Pytorch or similar frameworks that work on AMD GPUs. I am trying to gather feed-back from users that have used AMD GPUs, if you have some, comment in this issue!
the simplest is to apply for a dynamic access ("accès dynamique"). The procedure is likely similar to the Jean-Zay one but may have some small differences. In principle you can get access in a few days. In practice we will see ...
once you manage to have access you need to set things up again (datasets, conda environments, ssh set-up, etc ...). If you use modules (module load), Adastra likely has some but modules they are probably slightly different than Jean-Zay
Adastra has lingering software and hardware issues since it started, but maybe this is more specific to HPC than AI use cases? This partly explain why Adastra is currently underused and Adastra is recommended as fall-back option.
the Adastra support team is already quite busy working hard to fix these issues. There is a question on how they will handle the load if plenty of users try to migrate to Adastra in a short amount of time
The text was updated successfully, but these errors were encountered:
lesteve
changed the title
Jean-Zay planned February 5 partial stoppage and possible work-arounds
⚠️ 🔧 Jean-Zay planned February 5 partial stoppage and possible work-arounds 🔧 ⚠️
Jan 17, 2024
You probably have received an email on January 11 that says that a significant part of Jean-Zay is going to get partially shut-down on February 5 2024, in order to install new GPU nodes.
For now, here is the summary of my understanding of the situation, I'll update it when I get the time, feel free to comment below (or edit if you have the rights).
Summary of the situation
Adastra work-around
If you feel motivated/proactive/curious/bored, one possible work-around would be to apply to Adastra, I created a pad so that anyone can add relevant information there. At the time of writing (mid-January), this is pretty much anyone's guess whether this will be worth it or not for your particular use case ...
Here is a few things to know about Adastra:
module load
), Adastra likely has some but modules they are probably slightly different than Jean-ZayThe text was updated successfully, but these errors were encountered: