Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancies in Reproducing Results for Hyperparameter Tuning in REINVENT #28

Open
ankur56 opened this issue Dec 5, 2023 · 3 comments

Comments

@ankur56
Copy link

ankur56 commented Dec 5, 2023

Subject: Inconsistencies in Replicating Results from Supplemental Information (Figures 14 and 15, Section D.2)

Hello,

We've been working on reproducing the results for hyperparameter tuning in REINVENT, specifically for the zaleplon_mpo and perindopril_mpo oracles as presented in Figures 14 and 15, Section D.2 of the supplemental information. Despite following the installation and execution instructions in the README, our results differ from those published.

Issue Details:

  1. Discrepancy in Mean AUC Top-10 for zaleplon_mpo:
  • Published Result: Table-4 reports a mean AUC Top-10 of 0.358±0.062 across five independent runs.
  • Our Result: We observed a mean AUC Top-10 of 0.503±0.02.
  1. Performance Difference Between Sigma Values:
  • Published Behavior: A significant performance difference is reported between sigma values of 500 and 60 (Figure 14, Section D.2).
  • Our Observation: We found minimal performance difference between these sigma values (mean AUC Top-10 of 0.503 for sigma=500 vs 0.482 for sigma=60) for zaleplon_mpo.
  1. Other Discrepancies: We also noted discrepancies in several mean AUC Top-10 values reported in Table-4.

Seeking Clarification:

We would like to thoroughly analyze the behavior of the hyperparameter sigma and ensure the accuracy of our results. Could you please help us verify that our methodology aligns with your implementation? We want to ensure that there are no overlooked mistakes on our end or potential bugs in the code.

Any insights or suggestions you could provide would be greatly appreciated.

Thank you for your assistance.

@MorganCThomas
Copy link
Contributor

MorganCThomas commented Dec 5, 2023

I noticed this as well, so I looked into it and found that there were some bug fixes in TDC after the benchmark was published that affected the following oracles: zaleplon_mpo, sitagliptin_mpo, C11H24, C9H10N2O2PF2Cl. This has lead to inconsistent benchmark results since publication, especially considering zaleplon_mpo was used for hparam tuning.

For now when comparing benchmark results I omit these oracles, but this also raises the question, are you planning to update the publication with corrected results?

@ankur56
Copy link
Author

ankur56 commented Dec 5, 2023

@MorganCThomas Thank you for your reply. We have also noticed a minor deviation in the reported values for the perindopril_mpo oracle, which was another target for the hparam search in the publication. Therefore, it might not be accurate to conclude that sigma=500 is the optimal value for these oracles when compared to the sigma values of 60 and 120 used in previous REINVENT studies.

@MorganCThomas
Copy link
Contributor

Good to know thanks. I've assessed REINVENT before here and here (Fig 3e-g) and I would never recommend using sigma=500, I didn't even think to test a value this high based on personal experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants