Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about top auc metric #24

Open
vincrichard opened this issue Jun 29, 2023 · 2 comments
Open

Question about top auc metric #24

vincrichard opened this issue Jun 29, 2023 · 2 comments

Comments

@vincrichard
Copy link

Thank you for sharing your code and your work on benchmarking.

I have a question regarding the top_auc metrics especially this part:

if finish and len(buffer) < max_oracle_calls:
sum += (max_oracle_calls - len(buffer)) * top_n_now
return sum / max_oracle_calls
.

If I understand correctly, considering we have a dictionary of smiles and their corresponding scores, the conditions will always evaluate to true when the uniqueness is less than 1. This could potentially create issues if the model generates a large number of duplicate molecules.

One edge example would be a model for which when he generates a smiles, will necessarily generate 2 other identical smiles. This means the model will have 1/3 of the 10,000 molecules unique. Let's assume this model learns and so the score of each new unique molecule generated increases across oracle calls. In this case I think the current implementation of the top_auc include a biais. I think it's clearer with graphs:
top_auc = 0.5831
image
top_auc = 0.4995
image

values added to sum consist of [(top_k_scores + prev) / 2] * update_frequency in the for loop (len(scores) - limit_update) * [(top_k_scores + prev) / 2] and (len(results) - len(scores)) * [top_k_scores].

In this case, I think it is apparent that removing duplicates creates a bias to the top_auc computation. I am wondering if I am missing something and if this is expected? Could you enlighten me on this ? What is the intuition behind it ?

@wenhao-gao
Copy link
Owner

Thank you for your comment. The storage buffer for molecules in the form of canonical SMILES is implemented as a dictionary. This ensures that identical molecules do not appear more than once.

@vincrichard
Copy link
Author

@wenhao-gao Thank you for your answer.
My main concern is that it does not track molecules appearing more than once. As I tried to explain above, the fact that you are keeping only one identical molecule and not every molecule creates bias in my opinion.
Right now, duplicate molecules are treated as if they have the best score of the run. which creates the final plateau on the graph above.

Having every molecule makes it so that a model which increases its score linearly over time has a score of 0.5 (in this corner case we are reaching the best score at the last call). A score of 0.5 for this corner case makes sense to me. As I understand it, the top auc will measure how quickly a model will converge to a high score. So linear convergence can be seen as a random scenario for regular auc, ie one of the worst case scenarios. The issue with the current top auc is that the score is of 0.5 only if there are no duplicates.

Another example would be if the model stagnates on a subset of molecules at a small score. and then manage to reach high scoring molecules at the end. This will look like the following graph. You can clearly see that the plateau is not taken into account for the current auc. And this plateau could have been at the start with the score of 0 or at the end with a score of 1, the top_auc metric would not have changed.

top_auc = 0.959725
image

top_auc = 0.584775
image

I understand the examples I am showing are corner cases since they present a lot of molecule duplicates. However, I think this issue might impart the score in real cases as well, it is just less visible.
Once again, I am not 100% confident which is why I would like your opinion on this. If you could explain your reasoning for keeping only the duplicates it would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants