Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QoL changes: Add GPU types view, remove dtype and shapes info. No DB or functionality. #52

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

alexzhang13
Copy link
Collaborator

@alexzhang13 alexzhang13 commented Dec 13, 2024

A bunch of quality of life changes, as well as a change to GPU types.

Adds:

  • Checkers and ephemeral info when leaderboard creation is invalid (duplicate key, invalid dates, etc.) in try-catch.
  • New option and view to select GPU during problem creation.
    image

Removes:

  • dtype and shape information in /leaderboard submit ....
  • dtype and shape information in /leaderboard create ....

Leftover TODOs:

  • (urgent ⚠️!) Add DB changes to reflect new GPUs. @b9r5 let's talk about this more, but I think we should just have either score point to a DB with each valid GPU type (eh), or add a score for each possible GPU in the DB.
  • Add leaderboard delete (dangerous! we should also have special UI for this so users are sure)
  • Update /leaderboard list to also include all available GPU types for the particular problem.

Checklist

Before submitting this PR, ensure the following steps have been completed:

  • Run the slash command /verifyruns on your own server.
    • Run the cluster bot on your server:
      python discord-bot.py
    • Start training runs with the slash command /verifyruns.
    • Verify that the bot eventually responds with:
      ✅ All runs completed successfully!
      
      (It may take a few minutes for all runs to finish. In particular, the GitHub
      runs may take a little longer. The Modal run is typically quick.)
      For more information on running a cluster bot on your own server, see
      README.md.

@alexzhang13 alexzhang13 self-assigned this Dec 14, 2024
Comment on lines 67 to 71
try:
await interaction.response.send_message(msg)

except Exception:
await interaction.followup.send(msg)
Copy link
Collaborator

@b9r5 b9r5 Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would be better written as the following?

Suggested change
try:
await interaction.response.send_message(msg)
except Exception:
await interaction.followup.send(msg)
if interaction.response.is_done():
await interaction.followup.send(msg)
else:
await interaction.response.send_message(msg)

We do call interaction.response.defer() when we intend to use followup. And that call to defer() ends up making is_done() return True.

What makes me uncomfortable about except Exception is that we don't really know what's gone wrong.

Honestly, I think it would work either way, but I think I prefer my suggestion if it works.

Copy link
Collaborator Author

@alexzhang13 alexzhang13 Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this, I think this makes sense. Will verify that this works locally before I push though. Also renamed more generally to send_discord_message

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also changed the logic for use_followup=True in the followup commit.

self.connection.rollback() # Ensure rollback if error occurs
return f"Error during leaderboard creation: {e}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm nervous about returning the value with the stringified exception in it. The reason is that I'm not sure what's going to be in e. I wouldn't expect it to include sensitive information (like the database URL), but it could. What I would suggest is to log the full string (f"Error during leaderboard creation: {e}") but only return "Error during leaderboard creation".

Copy link
Collaborator Author

@alexzhang13 alexzhang13 Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, for now I won't log it (we can change this later) but will remove error message inside.

Comment on lines +317 to +321
if "duplicate key" in err:
await interaction.followup.send(
f'Error: Tried to create a leaderboard "{leaderboard_name}" that already exists.',
ephemeral=True,
)
Copy link
Collaborator

@b9r5 b9r5 Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like in the within the create_leaderboard implementation in leaderboard_db.py, we should examine the message in e and sanitize it. We could return a message containing "duplicate key" if we found that string, and we could check for other strings that we've found to occur. I just want to be careful not to leak sensitive information by returning the full string of e.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure what you mean by this, do you mean we sanitize the "err" string even if we never print it out?

})

await interaction.response.send_message(
if err:
print(err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(err)

...or log the error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed these, this should not have been in the commit. I was debugging at the time, thanks for catching!

)
else:
# Handle any other errors
await interaction.followup.send(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good idea to log the error in this case, so that we can look at the logs and then add more informative error messages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is fair, I've added logging information for the specific error here, but the Discord bot produces a fixed string.

Comment on lines 338 to 342
if "already exists" in str(e):
await interaction.followup.send(
str(e),
ephemeral=True,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could we get here? Shouldn't this be handled in or near the `"duplicate key" area above?

(I know I'm sounding like a broken record but) we shouldn't send str(e) to the Discord user. We should log a message containing e and send a generic message to the user like "Leaderboard already exists".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has to do with why I ended up re-factoring create_leaderboard to return an error string, because it wasn't being captured in the try except. I've removed this, but also made sure to remove the print errors on Discord and add them to a logger instead.

# Ask the user to select GPUs
view = GPUSelectionView([gpu.name for gpu in GitHubGPU])

await interaction.response.send_message(
Copy link
Collaborator

@b9r5 b9r5 Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be interaction.followup.send(...)? If there are going to be more messages in this interaction, then interaction.response.send_message(...) will cause problems.

I think the easiest thing to do in this function is to try to only send one response, and use interrupt.response.send_message(...). And return immediately after the message is sent. Using followup can definitely be confusing.

Copy link
Collaborator Author

@alexzhang13 alexzhang13 Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've wrapped this around our send_discord_message util to avoid this issue.

try:
date_value = datetime.strptime(deadline, "%Y-%m-%d")
except ValueError:
raise ValueError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do interaction.response.send_message("Invalid date format. Please use...") and then return here. It's a simpler control flow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this, there was also a mistake in here regarding bad implementations I forgot to delete. It's been simplified.

… to loggers). Remove use_followup with wrapper code to avoid giving user access to flags
@alexzhang13 alexzhang13 requested review from b9r5 and removed request for msaroufim and S1ro1 December 15, 2024 07:37
@b9r5
Copy link
Collaborator

b9r5 commented Dec 16, 2024

Hey Alex, thanks for the work on this PR. I've taken another look at the changes and wanted to share my thoughts.

First, generally, from my experience I've found that it's easy to unintentionally break things in this discord.py implementation. I've been trying to stick to small, incremental changes that are quicker to review, and easier to test (and rollback if needed). While it's early days and manual testing is still necessary, I've found this approach helps reduce the testing burden and ensure quicker write/deploy/test cycles.

Given that, my recommendation is to break this PR into smaller pieces:

a. worflow.yml changes. Removing the processing here is a good improvement and ties into getting /verifyruns working again. I was wondering if we should use the text none as a placeholder value for missing scores instead of removing the functionality, but I'm also OK with this approach.

b. send_discord_message. I'm not sure about the approach and think it would be helpful to get Siro's input before deciding on the best path forward.

c. Whitespace and string style differences. The changes here make me wonder if we should standardize on formatting. Has this been discussed yet? I'm open to a discussion on this to avoid inconsistency (and any poor style choices on my part 🙂) without spending too much time on formatting reviews.

d. GPU selection view changes. These look quite cool.

@msaroufim
Copy link
Member

msaroufim commented Dec 16, 2024

On styling lemme just use ruff and do a one time lint of the repo. Better to pull the Bandaid now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants