QoL changes: Add GPU types view, remove dtype and shapes info. No DB or functionality. #52

alexzhang13 · 2024-12-13T08:30:23Z

A bunch of quality of life changes, as well as a change to GPU types.

Adds:

Checkers and ephemeral info when leaderboard creation is invalid (duplicate key, invalid dates, etc.) in try-catch.
New option and view to select GPU during problem creation.

Removes:

dtype and shape information in /leaderboard submit ....
dtype and shape information in /leaderboard create ....

Leftover TODOs:

(urgent ⚠️!) Add DB changes to reflect new GPUs. @b9r5 let's talk about this more, but I think we should just have either score point to a DB with each valid GPU type (eh), or add a score for each possible GPU in the DB.
Add leaderboard delete (dangerous! we should also have special UI for this so users are sure)
Update /leaderboard list to also include all available GPU types for the particular problem.

Checklist

Before submitting this PR, ensure the following steps have been completed:

Run the slash command /verifyruns on your own server.
- Run the cluster bot on your server:
```
python discord-bot.py
```
- Start training runs with the slash command /verifyruns.
- Verify that the bot eventually responds with:
```
✅ All runs completed successfully!
```
  (It may take a few minutes for all runs to finish. In particular, the GitHub
  runs may take a little longer. The Modal run is typically quick.)
  For more information on running a cluster bot on your own server, see
  README.md.

…changes

b9r5 · 2024-12-15T00:32:06Z

src/discord-cluster-manager/utils.py

+    try:
+        await interaction.response.send_message(msg)
+
+    except Exception:
+        await interaction.followup.send(msg)


I wonder if this would be better written as the following?

Suggested change

try:

await interaction.response.send_message(msg)

except Exception:

await interaction.followup.send(msg)

if interaction.response.is_done():

await interaction.followup.send(msg)

else:

await interaction.response.send_message(msg)

We do call interaction.response.defer() when we intend to use followup. And that call to defer() ends up making is_done() return True.

What makes me uncomfortable about except Exception is that we don't really know what's gone wrong.

Honestly, I think it would work either way, but I think I prefer my suggestion if it works.

I've changed this, I think this makes sense. Will verify that this works locally before I push though. Also renamed more generally to send_discord_message

I also changed the logic for use_followup=True in the followup commit.

b9r5 · 2024-12-15T01:22:50Z

src/discord-cluster-manager/leaderboard_db.py

            self.connection.rollback()  # Ensure rollback if error occurs
+            return f"Error during leaderboard creation: {e}"


I'm nervous about returning the value with the stringified exception in it. The reason is that I'm not sure what's going to be in e. I wouldn't expect it to include sensitive information (like the database URL), but it could. What I would suggest is to log the full string (f"Error during leaderboard creation: {e}") but only return "Error during leaderboard creation".

Good point, for now I won't log it (we can change this later) but will remove error message inside.

b9r5 · 2024-12-15T01:27:41Z