Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which data we need to use ? #7

Open
thegodone opened this issue Aug 4, 2024 · 5 comments
Open

Which data we need to use ? #7

thegodone opened this issue Aug 4, 2024 · 5 comments

Comments

@thegodone
Copy link

I wonder to understand the datasets that are needs and according sources if available:

  • USPTO ?
  • USPTO-stereo ?
  • USPTO-MIT ?
  • ...
@avaucher
Copy link
Member

avaucher commented Aug 4, 2024

This GitHub repository is independent of any specific dataset; any of the datasets you mentioned can work.

Or was your question specific to a publication that relied on this repository?

@thegodone
Copy link
Author

Indeed, it was more related to data process and a publication reference:
What is the best way to do it ?
For example, let say with Pistachio, I need to use the atommap version or not ? etc...
That would be nice to have a demo of the data process on few reactions.

@avaucher
Copy link
Member

avaucher commented Aug 4, 2024

Hard to give a precise answer because some of the papers relied on previous and more "manual" versions of the code.

As an example, IIRC the atom-maps are removed by default.

My suggestion would be to start with https://github.com/rxn4chemistry/rxn-onmt-models?tab=readme-ov-file#the-easy-way, run the commands as provided, and see if what comes out is what you expected.

Calling the commands with --help also shows some useful options.

Let me know if this helps! Happy to help more, but I currently can't do much more than providing best-effort support.

@thegodone
Copy link
Author

thegodone commented Aug 5, 2024

Thanks Alain,
Maybe simpler option is to ask for the data in "Completion of partial chemical equations" paper as a start as suggeted in the Data availability statement section ?
Best Guillaume

@avaucher
Copy link
Member

avaucher commented Aug 5, 2024

Hi Guillaume,
Oh, in this case I'm of little help (not at IBM anymore). I'd suggest reaching out to Federico (email address in the paper) to ask for details.
Having said that, any dataset of sufficient size and quality should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants