-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detailed molecular basis set specification (output of a calculation, not input) #12
Comments
I've updated the text above. The changes were mostly related to pure functions. |
From Wikipedia:
so o, q, r, t, u, v, w, x, y, z, ... |
@berquist Thanks! I'll fix this (and other remarks later) in the top post. |
I think I found it: https://bse.pnl.gov/bse/docs/schemas/gaussian.xsd I'll take a look. I think this paper explains the xml spec to some extent: http://pubs.acs.org/doi/pdf/10.1021/ci600510j Are there other references? There is also https://bse.pnl.gov/bse/docs/schemas/ecp.xsd for effective core potentials. |
Core potentials as well? |
@gkc1000 Core potentials are not as urgent because they are not needed to visualize orbitals or to compute densities. They could be useful for more specialized post-processing, e.g. follow-up calculations with other levels of theory |
The EMSL BSE XML has core potentials AFAIK. If its not in the document its in another. |
There are discussions that have started regarding the BSE, it's XML, the need for extending the format, and possibly having JSON. Just FYI.
… On Aug 18, 2017, at 6:20 AM, Daniel Smith ***@***.***> wrote:
The EMSL BSE XML has core potentials AFAIK. If its not in the document its in another.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@wadejong : are these discussion done on an online forum, so we can follow up on them? |
Not at this point. We had a first preliminary discussion yesterday related to long term support for BSE and talking about simplification of the architecture. I discussed this schema effort too, seeing if we can connect the efforts and unify things. I will suggest an online discussion forum for this.
… On Aug 18, 2017, at 6:33 AM, Toon Verstraelen ***@***.***> wrote:
@wadejong : are these discussion done on an online forum, so we can follow up on them?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thanks for the info. I looked at the EMSL schema for basis sets. These are my main thoughts. I hope it can be of use:
As it is, the EMSL scheme is not a one-stop solution here for two reasons: some features are missing and it is not meant to be used for describing molecular basis sets. |
In our extended Chemical JSON format currently implemented in a side version of NWChem we do have a complete format for Gaussian basis sets. Will drop it in when I get into the office.
I do like some of the ideas. Especially dealing with the generalized contractions. My format simply unpacks them one shell at a time, which leads to some replication. Using matrix formulations might be a good approach.
… On Aug 18, 2017, at 1:42 AM, Toon Verstraelen ***@***.***> wrote:
Rough idea
The current schema does not yet have standardized method for specifying a Gaussian (or Slater) AO basis for a given molecule in full detail. This is needed when interpreting atomic orbitals or density matrices in the output. A simple string like "3-21G" is not enough because different codes use different conventions needed for the interpretation and because some basis set strings have not such a well-defined meaning.
I think we can mostly JSON-ize the molden format (see [GTO], [STO] and [MO] sections of http://www.cmbi.ru.nl/molden/molden_format.html) plus some improvements:
Make the ordering of the basis functions within one shell (and pure versus Cartesian) explicit as follows:
{"conventions": {
"d": ["xx", "yy", "zz", "xy", "xz", "yz"],
"f": ["-3", "-2", "-1", "0", "+1", "+2", "+3"]
}}
Allow very high angular momenta: s, p, d, f, g, h, i, j, k, l, m, n, o. (Not sure what should be done after o?)
Assume contractions are normalized already with some convention, i.e. do not assume that the program reading the contraction coefficients will fix the normalization for you. Instead, the program writiing out the contractions should take care. (Different programs use different conventions, which is information that you do not want to discard. The Molden format assumes contractions need to be normalized upon reading, which is not suitable for all cases.)
Support (very) generalized contractions, i.e. also compatible with basis sets from CP2K. See https://github.com/cp2k/cp2k/blob/master/cp2k/data/BASIS_MOLOPT (It does not get more general than that.) For every shell we should have a list of angular momenta, e.g. ["s", "p"], ["s", "s"] or ["s", "s", "s", "p", "p", "d"]. The last example is something you could encounter in a molopt basis set.
Keep a list of centers for the shells, which can be different from atomic positions, e.g. when using ghost atoms or doing other funny things.
Include pseudopotential specification? (I'm not an expert.)
Details for STO basis functions need to be worked out. (I'm not an expert on that either.)
Example for NH3
{"basis": {
"type": "gto",
"conventions": {
"p": ["x", "y", "z"],
"d": ["0", "+1", "-1", "+2", "-2"],
},
"shells": [
[0, 8, ["s", "s"],
[0.9046E+4, 0.1357E+4, 0.3093E+3, 0.8773E+2, 0.2856E+2, 0.1021E+2, 0.3838E+1, 0.7466],
[0.6996174134E-3, 0.538605463E-2, 0.2739102119E-1, 0.103150592, 0.2785706633, 0.4482948495, 0.2780859284, 0.1543156123E-1],
[-0.304990096E-3, -0.2408026379E-2, -0.1194444873E-1, -0.489259929E-1, -0.1344727247, -0.3151125777, -0.2428578325, 0.1094382207E+1],
],
[0, 1, ["s"], [0.2248], [1.0]],
[0, 1, ["s"], [0.6124E-1], [1.0]],
[0, 3, ["p"],
[0.1355E+2, 0.2917E+1, 0.7973],
[0.5890567677E-1, 0.3204611067, 0.7530420618]],
[0, 1, ["p"], [0.2185], [1.0]],
[0, 1, ["p"], [0.5611E-1], [1.0]],
[0, 1, ["d"], [0.817], [1.0]],
[0, 1, ["d"], [0.23], [1.0]],
[1, 3, ["s"],
[0.1301E+2, 0.1962E+1, 0.4446],
[0.3349872639E-1, 0.2348008012, 0.8136829579]],
[1, 1, ["s"], [0.122], [1.0]],
[1, 1, ["s"], [0.2974E-1], [1.0]],
[1, 1, ["p"], [0.727], [1.0]],
[1, 1, ["p"], [0.141], [1.0]],
[2, 3, ["s"],
[0.1301E+2, 0.1962E+1, 0.4446],
[0.3349872639E-1, 0.2348008012, 0.8136829579]],
[2, 1, ["s"], [0.122], [1.0]],
[2, 1, ["s"], [0.2974E-1], [1.0]],
[2, 1, ["p"], [0.727], [1.0]],
[2, 1, ["p"], [0.141], [1.0]],
[3, 3, ["s"],
[0.1301E+2, 0.1962E+1, 0.4446],
[0.3349872639E-1, 0.2348008012, 0.8136829579]],
[3, 1, ["s"], [0.122], [1.0]],
[3, 1, ["s"], [0.2974E-1], [1.0]],
[3, 1, ["p"], [0.727], [1.0]],
[3, 1, ["p"], [0.141], [1.0]],
],
"centers": [
[-0.0140883131, 0.0845903925, 0.1037711513]
[1.4952113836, 0.0214187375, 0.0445603623]
[-0.5919457779, -1.6621666211, 0.5350312215]
[-0.7075176488, 0.4654193413, -2.0214243076]
]
}}
Details
The shells field is a list, where each item represents one generalized contraction stored as a list with the following items:
center index (counting from zero?)
the number of primitives
the angular momenta for the generalized contraction
the Gaussian exponents
one or more lists of contraction coefficients. (More are present in case of generalized contractions.)
To do
units
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@wadejong Could you post the Gaussian basis JSON format please? No rush. Thanks! |
I added a directory Basis_Sets with an example Gaussian basis JSON file and
also an example that has an ECP in it to the QC_JSON repository.
…On Tue, Aug 22, 2017 at 8:35 AM, Toon Verstraelen ***@***.***> wrote:
@wadejong <https://github.com/wadejong> Could you post the Gaussian basis
JSON format please? No rush. Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGa9cs-lOecMbsuiAWTC5lHyyRyZ6pXfks5savVGgaJpZM4O7Q7s>
.
|
@wadejong Thanks! It seems that these JSON files have a purpose comparable to the EMSL XML scheme, i.e. describing basis functions for atoms. This is typically the input one would give to a QC code, which is somewhat different from the problem I'm trying to address, how to represent the molecular basis set in the output of a QC calculation. (See comment above on XML scheme.) After going through the https://github.com/MolSSI/QC_JSON_Schema/commit/ce3230167f4cc1276f1ff1128ff9c17b9759cdea, I wondered if it is preferable to work with dictionaries to represent the shells? I'm the idea above, I've just used a list in which the order of the elements fixes their meaning. That is not as self-explaining but it makes the JSON more compact. |
@tovrstra I mihgt be missing sth, but didn't we decide to represent |
I have been working on this in Chemical JSON, and use a number of related arrays to arrange the data in linear arrays following the large array convention we decided to adopt. We have a We have been able to use this along with coordinate sets and multiple MO coefficient arrays to animate a series of calculations where multiple calculations were performed on a system as it bonded. |
@cryos Thanks for the information! I have a few small questions: Is there a detailed description of this JSON format, e.g. also including normalization conventions of primitives and contracted basis functions? Would you have a smaller example too? (E.g. for testing purposes.) Which implementations can already write Chemical JSON or read it for visualization and post-processing? How does Chemical JSON handle generalized contractions? |
Coming back to this, we added a basis input specification from @bennybp which is a mimic of the data structure used in the new Basis Set Exchange. As a note this has been added here. This covers all use cases extant in the BSE and should provide good coverage of what is out there (@bennybp may have a few items to add). I believe the only item that does not exist is ordering. For ordering I am leaning a bit towards simply providing ordering arrays as per CCLib. I believe the only other viable option would be the CCA standard? |
Either of these options seems reasonable. There was a lot of support for just doing CCA at the recent ACS symposium - anecdotal, but.... |
@dgasmith Could you post a link to the CCA standard? Thanks. |
Thanks. I'll come back to it later. |
At the very least we could recommend the CCA ordering even if we support others. It would make things simpler in many cases. |
I would not mind fixing the ordering of basis functions and related conventions in QCSchema, as long as it is all clearly defined. Something like "the same as in program ABC" is not good enough because most other programs never explicitly define their conventions. Here are some examples of potential sources of confusion from the codes mentioned above (not meant as criticism, just to illustrate):
Can we include in the QCSchema documentation or definition the following information?
|
@twindus Do you have the CCA order for spherical basis sets? I think its 0, -/+ 1, ... On that particular topic there is the fun question of what to do with I am leaning a bit back towards saying that the AO quantities must be in CCA order. It puts slightly more onus on the generation of the schema data, but makes everything downstream much easier to handle. We can supply translation routines in QCElemental/QCEngine that can handle this.
|
Regarding ORCA, it could be specifically an issue in their routines to write out Molden files. When we read in Molden files with apparently incorrectly normalized orbitals, we run a series of checks to detect common deviations from the original file format. Orca has a sign change in pure AO functions from m>=3 on, which could (or not) reflect their internal conventions. Our settings for reading ORCA Molden files can be found here: https://github.com/theochem/iodata/blob/master/iodata/formats/molden.py#L393 |
From discussion with @wadejong and @cryos this week at Berkeley I think we are leaning towards having purely the CCA standard within the schema. This puts some additional effort upfront, but ultimately makes things far simpler. Looking at the CCA standard (https://onlinelibrary.wiley.com/doi/epdf/10.1002/jcc.20815) the spherical ordering is L, ..., 0, ..., -L. However, the libint documentation appears to describe the CCA ordering as -L, ..., 0, ..., L. |
the CCA specification of the solid harmonics ordering in our paper is incomplete and thus I cannot conclude whether what libint implemented is “correct” or “not”, e.g. it does not specify what according to this libint implements what the de-facto CCA standard defining code (MPQC2) defined: https://github.com/ValeevGroup/mpqc/blame/19ad332791d5966c98420d926364d68aeec859ae/src/lib/chemistry/qc/basis/transform.cc#L101 so I think this is what the de-facto CCA order is, since the specification in the paper is (unfortunately) incomplete |
@dgasmith @evaleev Thanks for sharing the info. I agree, even when stating the ordering is |
Sure. Attribution would be helpful. When are you planning to release the
schema?
…On Wed, Jun 5, 2019, 11:42 AM Daniel Smith ***@***.***> wrote:
@evaleev <https://github.com/evaleev> Can we borrow these definitions
from your living wikipedia page? @tovrstra <https://github.com/tovrstra>
could you work on writing up the format if this is the case?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12?email_source=notifications&email_token=AAQXIZ5GRL4XCGRD2XOBDMTPY7NELA5CNFSM4DXNB3WKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXAEB2I#issuecomment-499138793>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAQXIZZCA5EQEH45S7JYJK3PY7NELANCNFSM4DXNB3WA>
.
|
I wouldn't mind and attribution is obviously not a problem. I'm not sure what is meant by "living wikipedia page"? Could someone post a link? Thanks. |
That appears to be a autocorrection issue "libint wikipedia page", as seen here. @evaleev We have had a minimal version 1 out which carries energy-based quantities and some properties-- we took some time to explore how that works in practice. You should be seeing a larger push from MolSSI to get more quantities into the schema. |
@devinamatthews contributes a nice snippet for mapping basis function indices from Psi4 to CFOUR (C1 only).
|
Thanks @loriab. While this is an example of a one-off mapping (and it was hard enough to figure out even in C1 with no reorientation), it seems like using the MOLDEN format or a JSON format as discussed here could make this much more portable. Basically what I mean is that if someone were to write a tool that took two MOLDEN/JSON descriptions and generated the transformation matrix from one to the other, then reading in orbitals from another program would be as easy as reversing the MOLDEN-writer code and doing a matrix multiply. Such a tool would have to handle:
Maybe this is as easy as just evaluating a bunch of overlap integrals? |
Rough idea
The current schema does not yet have standardized method for specifying a Gaussian (or Slater) AO basis that was used in a calculation for a given molecule. Just to be clear: I'm looking for a way to fully specify a basis set in the output of a calculation, after the calculation has been carried out. This is different from specifying a basis set in the input for a calculation, because in the output you want to know how to interpret quantities expressed in an AO basis. For that you need more information, namely:
Because basis functions are not necessarily centered on nuclei, an array with centers must be included as well.
I think we can mostly JSON-ize the molden format (see [GTO], [STO] and [MO] sections of http://www.cmbi.ru.nl/molden/molden_format.html) plus some improvements:
Make the ordering of the basis functions within one shell (and pure versus Cartesian) explicit as follows:
Cartesian functions always match the
x*y*z*
regular expression. For pure functions, the string consists of four parts: sign (optional), angular momentum (letter code), magnetic quantum number (absolute value),s
orc
for sine- or cosine-like. The optional sign is needed because some codes (e.g. ORCA) have unusual sign conventions, different from most other QC codes. The line for the f-functions in the example is how ORCA orders basis functions (and flips signs) in AO arrays in a Molden file. (This is different from what the Molden program actually expects.) To fix the meaning of the strings for the pure functions completely, we should write out the mathematical form in the JSON schema.Allow very high angular momenta, exhausting the whole alphabet (except j, see Wikipedia, more authoritative reference always welcome):
["s", "p", "d", "f", "g", "h", "i", "k", "l", "m", "n", "o", "q", "r", "t", "u", "v", "w", "x", "y", "z", "a", "b", "c", "e"]
.Assume contractions are normalized already, i.e. do not assume that the program reading the contraction coefficients will fix the normalization for you. Instead, the program writiing out the contractions should take care of that. (The Molden format assumes contractions need to be normalized upon reading, which is not suitable for all cases.)
Support (very) generalized contractions, i.e. also compatible with basis sets from CP2K. See https://github.com/cp2k/cp2k/blob/master/cp2k/data/BASIS_MOLOPT (It does not get more general than that.) For every shell we should have a list of angular momenta, e.g.
["s", "p"]
,["s", "s"]
or["s", "s", "s", "p", "p", "d"]
. The last example is something you could encounter in a molopt basis set.Keep a list of centers for the shells, which can be different from atomic positions, e.g. when using ghost atoms or doing other funny things.
Include pseudopotential specification? (I'm not an expert.)
Details for STO basis functions need to be worked out. (I'm not an expert on that either.)
Example for NH3
Details
The
shells
field is a list, where each item represents one generalized contraction stored as a list with the following items:To do
The text was updated successfully, but these errors were encountered: