Skip to content

Commit

Permalink
Merge pull request #18 from usnistgov/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
knc6 authored Jul 6, 2021
2 parents 56a2be1 + 7ef12f2 commit f8d42a8
Show file tree
Hide file tree
Showing 63 changed files with 1,461 additions and 60 deletions.
31 changes: 28 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,24 +7,49 @@ Installation
-------------------------
First create a conda environment:
Install miniconda environment from https://conda.io/miniconda.html
Based on your system requirements, you'll get a file something like 'Miniconda3-latest-XYZ'.

Now,

```
bash Miniconda3-latest-Linux-x86_64.sh (for linux)
bash Miniconda3-latest-MacOSX-x86_64.sh (for Mac)
```
Download 32/64 bit python 3.6 miniconda exe and install (for windows)
Now, let's make a conda environment, say "version", choose other name as you like::
```
conda create --name version python=3.8
source activate version
```
Now, let's install the package

Now, let's install the package:
```
git clone https://github.com/usnistgov/alignn.git
cd alignn
python setup.py develop
```
Example
Examples
---------
Users can keep their structure files in POSCAR, .cif, or .xyz files in a directory. In the examples below we will use POSCAR format files. In the same directory, there should be id_prop.csv file.
In this id_prop.csv, the filenames, and correponding target values are kept in comma separated values (csv) format.
Here is an example of training OptB88vdw bandgaps of 50 materials from JARVIS-DFT. The example is created using the script provided in the script folder.
Users can modify the script more than 50 data, or make their own dataset in this format. The dataset in split in 80:10:10 as training-validation-test set.
With the configuration parameters given in config_example_regrssion.json, the model is trained.

```
python alignn/scripts/train_folder.py --root_dir "alignn/examples/sample_data" --config "alignn/examples/sample_data/config_example_regrssion.json"
```
While the above example is for regression, the follwoing example shows a classification task for metal/non-metal based on the above bandgap values. We transform the dataset
into 1 or 0 based on a threshold of 0.01 eV (controlled by the parameter, 'classification_threshold') and train a similar classification model.
```
python alignn/scripts/train_folder.py --root_dir "alignn/examples/sample_data" --config "alignn/examples/sample_data/config_example_classification.json"
```
While the above example regression was for single-output values, we can train multi-output regression models as well.
An example is given below for training formation energy per atom, bandgap and total energy per atom simulataneously. The script to generate the example data is provided in the script folder of the sample_data_multi_prop.
Another example of training electron and phonon density of states is provided also.
```
python alignn/scripts/train_folder_multi_prop.py --root_dir "alignn/examples/sample_data_multi_prop" --config "alignn/examples/sample_data/config_example_regrssion.json"
```

You can also try multiple example scripts to run multiple dataset training. Look into the 'scripts' folder.
You can also try multiple example scripts to run multiple dataset training. Look into the 'scripts' folder.
These scripts automatically download datasets from jarvis.db.fighshare module in JARVIS-Tools and train several models. Make sure you specify your specific queuing system details in the scripts.
2 changes: 1 addition & 1 deletion alignn/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
"""Version number."""
__version__ = "2021.06.20"
__version__ = "2021.07.05"
11 changes: 11 additions & 0 deletions alignn/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,11 @@
"shear modulus",
"elastic anisotropy",
"U0",
"HOMO",
"LUMO",
"R2",
"ZPVE",
"omega1",
"mu",
"alpha",
"homo",
Expand All @@ -86,6 +91,7 @@
"A",
"B",
"C",
"all",
"target",
"max_efg",
"avg_elec_mass",
Expand All @@ -95,6 +101,9 @@
"_oqmd_stability",
"edos_up",
"pdos_elast",
"bandgap",
"energy_total",
"net_magmom",
]


Expand All @@ -112,10 +121,12 @@ class TrainingConfig(BaseSettings):
"mp_3d_2020",
"qm9",
"qm9_dgl",
"qm9_std_jctc",
"user_data",
"oqmd_3d_no_cfid",
"edos_up",
"edos_pdos",
"qmof",
] = "dft_3d"
target: TARGET_ENUM = "formation_energy_peratom"
atom_features: Literal["basic", "atomic_number", "cfid", "cgcnn"] = "cgcnn"
Expand Down
44 changes: 40 additions & 4 deletions alignn/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@

# from sklearn.pipeline import Pipeline
import pickle as pk
from sklearn.decomposition import PCA # ,KernelPCA

# from sklearn.decomposition import PCA # ,KernelPCA
from sklearn.preprocessing import StandardScaler

# use pandas progress_apply
Expand Down Expand Up @@ -296,6 +297,29 @@ def get_train_val_loaders(
)
print("Converting target data into 1 and 0.")
all_targets = []

# TODO:make an all key in qm9_dgl
if dataset == "qm9_dgl" and target == "all":
print("Making all qm9_dgl")
tmp = []
for ii in d:
ii["all"] = [
ii["mu"],
ii["alpha"],
ii["homo"],
ii["lumo"],
ii["gap"],
ii["r2"],
ii["zpve"],
ii["U0"],
ii["U"],
ii["H"],
ii["G"],
ii["Cv"],
]
tmp.append(ii)
print("Made all qm9_dgl")
d = tmp
for i in d:
if isinstance(i[target], list): # multioutput target
all_targets.append(torch.tensor(i[target]))
Expand Down Expand Up @@ -347,8 +371,20 @@ def get_train_val_loaders(
if standard_scalar_and_pca:
y_data = [i[target] for i in dataset_train]
# pipe = Pipeline([('scale', StandardScaler())])
if not isinstance(y_data[0], list):
print("Running StandardScalar")
y_data = np.array(y_data).reshape(-1, 1)
sc = StandardScaler()

sc.fit(y_data)
print("Mean", sc.mean_)
print("Variance", sc.var_)
try:
print("New max", max(y_data))
print("New min", min(y_data))
except Exception as exp:
print(exp)
pass
# pc = PCA(n_components=output_features)
# pipe = Pipeline(
# [
Expand All @@ -357,9 +393,9 @@ def get_train_val_loaders(
# ]
# )
pk.dump(sc, open("sc.pkl", "wb"))
pc = PCA(n_components=40)
pc.fit(y_data)
pk.dump(pc, open("pca.pkl", "wb"))
# pc = PCA(n_components=10)
# pc.fit(y_data)
# pk.dump(pc, open("pca.pkl", "wb"))

if classification_threshold is None:
try:
Expand Down
11 changes: 11 additions & 0 deletions alignn/examples/sample_data_multi_prop/POSCAR-JVASP-10.vasp
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
System
1.0
1.6777483798834445 -2.9059452409270157 -1.1e-15
1.6777483798834438 2.9059452409270126 -7e-16
-6.5e-15 -8e-16 6.220805465667012
V Se
1 2
direct
0.0 0.0 0.0
0.6666669999999968 0.3333330000000032 0.7479606991085345
0.3333330000000032 0.6666669999999968 0.252039300891465
12 changes: 12 additions & 0 deletions alignn/examples/sample_data_multi_prop/POSCAR-JVASP-107772.vasp
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
System
1.0
4.376835486482439 0.0086562096165887 7.148251977291244
2.0211490103296166 3.8822313794698684 7.148251977291244
0.0142338540946976 0.008656214510917 8.38176620441039
Bi Sb
3 1
direct
0.11687114695010013 0.11687114695010009 0.11687114695134818
0.885057350916569 0.8850573509165686 0.8850573509144881
0.3806761740317465 0.3806761740317465 0.3806761740305207
0.6173953281015849 0.6173953281015848 0.6173953281036431
10 changes: 10 additions & 0 deletions alignn/examples/sample_data_multi_prop/POSCAR-JVASP-1372.vasp
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
System
1.0
3.5058938597621094 -3.081249e-10 2.0241289627124215
1.1686312862968669 3.3053879820023613 2.0241289627124215
-8.715088e-10 -6.162497e-10 4.048256928443838
Al As
1 1
direct
0.0 0.0 0.0
0.24999999999999997 0.25 0.24999999999999997
20 changes: 20 additions & 0 deletions alignn/examples/sample_data_multi_prop/POSCAR-JVASP-14014.vasp
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
System
1.0
4.157436115454804 -0.0 0.0
-0.0 7.038494037846648 0.0
0.0 0.0 7.411178065479046
Tb Mn Si
4 4 4
direct
0.25 0.5014870121743014 0.1863876024079978
0.75 0.49851298782569875 0.8136123975920022
0.25 0.0014870121743013 0.3136123975920026
0.75 0.9985129878256985 0.6863876024079979
0.75 0.8608859093979077 0.06027225572734521
0.25 0.1391140906020922 0.939727744272655
0.75 0.3608859093979077 0.43972774427265476
0.25 0.6391140906020923 0.5602722557273451
0.25 0.792634730215007 0.8919146343653592
0.75 0.207365269784993 0.10808536563464112
0.25 0.2926347302150071 0.6080853656346409
0.75 0.707365269784993 0.3919146343653588
15 changes: 15 additions & 0 deletions alignn/examples/sample_data_multi_prop/POSCAR-JVASP-14873.vasp
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
System
1.0
4.191262576674699 0.0 -0.0
-0.0 4.191262576674699 0.0
-0.0 0.0 4.191262576674699
Sr B
1 6
direct
0.0 0.0 0.0
0.2028453684309125 0.5 0.5
0.5 0.5 0.7971546315690875
0.5 0.5 0.2028453684309125
0.5 0.2028453684309125 0.5
0.5 0.7971546315690875 0.5
0.7971546315690875 0.5 0.5
12 changes: 12 additions & 0 deletions alignn/examples/sample_data_multi_prop/POSCAR-JVASP-15345.vasp
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
System
1.0
3.5666343258756448 0.0 0.0
0.0 3.60492483256326 -1.0516480500920402
0.0 0.0043800721536433 3.7551864245512623
Y Co C
1 1 2
direct
0.0 0.9970518040927455 0.00294819590736377
0.5 0.6150401254054609 0.3849598745944179
0.5 0.15192770422861318 0.5440196337270639
0.5 0.4559803662731792 0.8480722957711558
10 changes: 10 additions & 0 deletions alignn/examples/sample_data_multi_prop/POSCAR-JVASP-1996.vasp
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
System
1.0
3.93712543178282 0.0 2.273100275741533
1.3123751439276066 3.7119571065192623 2.273100275741533
0.0 0.0 4.546200551483066
Na I
1 1
direct
0.0 0.0 0.0
0.5 0.5 0.5
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
System
1.0
1.6712283e-08 -2.508029669761222 3.5458136263853106
-2.172017276374766 1.254014874098203 3.545813646368687
-2.17201803290572 -1.254014795663004 -3.5458136064019246
Xe
1
direct
0.0 0.0 0.0
13 changes: 13 additions & 0 deletions alignn/examples/sample_data_multi_prop/POSCAR-JVASP-22556.vasp
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
System
1.0
3.790914410660539 -0.0 0.0
0.0 3.790914410660539 0.0
-0.0 -0.0 3.790914410660539
Sr Fe O
1 1 3
direct
0.4999990000000025 0.4999990000000025 0.4999990000000025
0.0 0.0 0.0
0.4999990000000025 0.0 0.0
0.0 0.4999990000000025 0.0
0.0 0.0 0.4999990000000025
16 changes: 16 additions & 0 deletions alignn/examples/sample_data_multi_prop/POSCAR-JVASP-28397.vasp
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
System
1.0
0.0 -3.9587610833154616 0.0
-6.655928089533787 0.0 0.0
0.0 0.0 -23.94045079597872
Si S
4 4
direct
0.0 0.0 0.5217263648738928
0.0 0.5 0.5217263648738928
0.5 0.0 0.4661843287970869
0.5 0.5 0.4661843287970869
0.0524900550457348 0.7500000000000001 0.5792751627500841
0.9475099449542649 0.25 0.5792751627500841
0.4474867899417421 0.7500000000000001 0.40863414357892813
0.5525132100582582 0.25 0.40863414357892813
17 changes: 17 additions & 0 deletions alignn/examples/sample_data_multi_prop/POSCAR-JVASP-28565.vasp
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
System
1.0
3.3542337275744103 0.0 0.0
-1.6771168637872051 2.904850045503021 0.0002624763854255
0.0 0.0027191662067697 29.227006366170723
Te Mo W Se S
2 2 1 2 2
direct
0.3332869375319439 0.6665748750638851 0.40880470960271253
0.3334050587801155 0.6668121175602401 0.2788667649202174
0.3335186248083656 0.6670362496167267 0.11458088390572692
0.6666801663591748 0.33335933271835083 0.3438741864905265
0.3331138284834239 0.6662286569668452 0.5825261819744669
0.6664959135073409 0.3329918270146898 0.5252624785410095
0.6664002633646019 0.3327995267292049 0.639695963563112
0.6669034856871763 0.33380597137434753 0.06276093238425431
0.6667957214778616 0.3335914429557147 0.1664678986179713
20 changes: 20 additions & 0 deletions alignn/examples/sample_data_multi_prop/POSCAR-JVASP-28634.vasp
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
System
1.0
3.2250494729190726 2.216578e-10 2.163e-13
-1.6125242360333125 2.7929740932668943 -8.4754501616e-06
6.6135e-12 -9.7386036979e-05 34.14068378844537
Mo W Se S
1 3 2 6
direct
0.33331504683701274 0.6666300936741254 0.0934536896030271
0.33331656497272827 0.6666341299456554 0.47208556491119213
0.6666698924990063 0.33333978499784117 0.28247993950367123
0.6666991220003533 0.3333972440007849 0.6548932577640771
0.33334341592455036 0.6666888318491291 0.3328984548376194
0.33332779768550075 0.6666565953711356 0.2320623862553914
0.33337390316228616 0.6667488063245635 0.7006769649140224
0.6666482437297062 0.33329648745935525 0.0479187308537724
0.6666490346540366 0.33329706930798436 0.4262562969343839
0.6666480620607286 0.33329512412140816 0.1390473399063092
0.6666512247313715 0.33330144946262313 0.5179005541158488
0.33335669174272836 0.6667143834853922 0.6090748204006833
20 changes: 20 additions & 0 deletions alignn/examples/sample_data_multi_prop/POSCAR-JVASP-28704.vasp
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
System
1.0
3.292134155794691 0.0 0.0
-1.6460670778973454 2.8510786681143565 -3.615752048e-06
0.0 -0.0001768206441563 34.978246270650075
Mo W Se S
3 1 6 2
direct
0.3333192249247229 0.6666384498494459 0.0966675137812058
0.6666558392026922 0.3333126784053811 0.2791091321598709
0.6666926464398288 0.3333852928796507 0.6584266681105794
0.3333321581155246 0.666664316231042 0.468712621722382
0.3333230348691514 0.6666470697383001 0.3273013286877133
0.3333590853838843 0.6667201707677635 0.7065686651425439
0.6666558450522622 0.3333116901045159 0.4202334807678751
0.6666765519168195 0.3333521038336415 0.5171906710959798
0.3333210915930135 0.6666421831860199 0.23092332016136433
0.3333560781404684 0.6667131562809414 0.6102231914914819
0.6666426939013361 0.33328538780266426 0.0528105185440566
0.6666647504603075 0.33332850092061717 0.1405808883349591
Loading

0 comments on commit f8d42a8

Please sign in to comment.