Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scripts to read go database #1

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
23 changes: 23 additions & 0 deletions examples/tools/go_db.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env python
import pyscf
from pyscf.tools.go_db import get_xyz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed go_db.py to get_xyz.py. Is it possible to remove this import statement entirely? I'd prefer to minimize the number of imports in the example file, and to minimize the number of total lines. Users can choose to add this line if they want to simplify the code later in the file, but ideally an "example" file would be supremely simple, like this one: https://github.com/pyscf/pyscf/blob/master/examples/scf/00-simple_hf.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change will make it confusing for people, especially because the module and the function are called the same name. If we would need to remove the import statement then we need to put the function in the same script.
Is that something that you would like? If yes, it would contradict your other statement that we need to minimize number of lines in total.

The example at the moment is more compact in terms of number of lines when compared to having the function in the script. Let me know how you would like to tackle this.

import requests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When trying this code, I got an error, leading me to run pip install requests before running the program. Is it impossible to get this working without this dependency? The fewer dependencies, the easier it is for beginners to get started without seeing error messages :)

Copy link
Member Author

@vandan-revanur vandan-revanur Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get it to work, by downloading the database.txt file and putting the path to the file in the script.
However, I feel it is ugly and everyone would have to edit the script to put their respective paths.
The current approach is cleaner because people can just run the script without making any changes to the script.

However, if you prefer the file approach I can fix that.

Copy link
Member

@ndattani ndattani Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you clone PySCF, you get a database of basis sets (STO-3G, 6-31G, cc-pVDZ, def2-TZVPP, etc.) for each element in the periodic table, downloaded. Then PySCF can just read that database file directly. So the part about this being "ugly" because of needing to edit the script with a personalized "path", is not necessary.

The disadvantage of having the database shipped with PySCF is that it's huge. I won't try to include a database of this size in PySCF without prior approval (and perhaps some studies on what would be the best compression/decompression method, if any).

So for now, the URL approach is best. It is just so unfortunate that we need to import another package for this!


if __name__ == '__main__':
vandan-revanur marked this conversation as resolved.
Show resolved Hide resolved
VERBOSITY = 9
CONV_TOL = 1e-7
BASIS = 'sto-3g'
FROZEN = 4
vandan-revanur marked this conversation as resolved.
Show resolved Hide resolved
url = 'https://raw.githubusercontent.com/HPQC-LABS/goDatabase/master/benchmark_sets/gw5000.txt'
mol_name = 'C10H15NO2S2:GW5000.v0'

page = requests.get(url)
data = page.text

molecule_xyz = get_xyz(data, mol_name)

print(molecule_xyz)

neut_mol = pyscf.M(atom=molecule_xyz, basis=BASIS, verbose=VERBOSITY, spin=0, charge=0)

print(neut_mol)
8 changes: 8 additions & 0 deletions pyscf/tools/go_db.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
import re

def get_xyz(db_data:str, mol_name:str)-> str:
p = re.compile(f'(NAME = {mol_name})(.*?)Date modified:(.*?)\)(.*?)NAME', flags=re.DOTALL)
result = p.search(db_data)

xyz_info = result.group(4)
return xyz_info