-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about top auc metric #24
Comments
Thank you for your comment. The storage buffer for molecules in the form of canonical SMILES is implemented as a dictionary. This ensures that identical molecules do not appear more than once. |
@wenhao-gao Thank you for your answer. Having every molecule makes it so that a model which increases its score linearly over time has a score of 0.5 (in this corner case we are reaching the best score at the last call). A score of 0.5 for this corner case makes sense to me. As I understand it, the top auc will measure how quickly a model will converge to a high score. So linear convergence can be seen as a random scenario for regular auc, ie one of the worst case scenarios. The issue with the current top auc is that the score is of 0.5 only if there are no duplicates. Another example would be if the model stagnates on a subset of molecules at a small score. and then manage to reach high scoring molecules at the end. This will look like the following graph. You can clearly see that the plateau is not taken into account for the current auc. And this plateau could have been at the start with the score of 0 or at the end with a score of 1, the top_auc metric would not have changed. I understand the examples I am showing are corner cases since they present a lot of molecule duplicates. However, I think this issue might impart the score in real cases as well, it is just less visible. |
Thank you for sharing your code and your work on benchmarking.
I have a question regarding the
top_auc
metrics especially this part:mol_opt/main/optimizer.py
Lines 46 to 48 in caa98f3
If I understand correctly, considering we have a dictionary of smiles and their corresponding scores, the conditions will always evaluate to true when the uniqueness is less than 1. This could potentially create issues if the model generates a large number of duplicate molecules.
One edge example would be a model for which when he generates a smiles, will necessarily generate 2 other identical smiles. This means the model will have 1/3 of the 10,000 molecules unique. Let's assume this model learns and so the score of each new unique molecule generated increases across oracle calls. In this case I think the current implementation of the top_auc include a biais. I think it's clearer with graphs:
top_auc = 0.5831
top_auc = 0.4995
values added to sum
consist of[(top_k_scores + prev) / 2] * update_frequency
in the for loop(len(scores) - limit_update) * [(top_k_scores + prev) / 2]
and(len(results) - len(scores)) * [top_k_scores]
.In this case, I think it is apparent that removing duplicates creates a bias to the top_auc computation. I am wondering if I am missing something and if this is expected? Could you enlighten me on this ? What is the intuition behind it ?
The text was updated successfully, but these errors were encountered: