Skip to content

Commit

Permalink
Add more notebooks assessing training effect
Browse files Browse the repository at this point in the history
  • Loading branch information
WardLT committed Oct 7, 2024
1 parent 9985ec3 commit e26cd8d
Show file tree
Hide file tree
Showing 8 changed files with 788 additions and 483 deletions.
467 changes: 0 additions & 467 deletions scripts/evaluate-generated-mofs/0_effect-of-training.ipynb

This file was deleted.

467 changes: 467 additions & 0 deletions scripts/evaluate-generated-mofs/0_summarize-model-outcomes.ipynb

Large diffs are not rendered by default.

105 changes: 89 additions & 16 deletions scripts/evaluate-generated-mofs/1_effect-of-scale.ipynb

Large diffs are not rendered by default.

232 changes: 232 additions & 0 deletions scripts/evaluate-generated-mofs/2_effect-of-training.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "5211ca75-c0ed-4e81-ae37-2a21533656f1",
"metadata": {},
"source": [
"# Evaluate the Effect of Training \n",
"We can assess whether retraining Difflinker leads to improved performance in two ways:\n",
"1. Evaluate how much the success rate improves with re-training\n",
"2. The difference between the total number of stable MOFs found w/ and w/o a closed loop"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "e0c564c2-ec6d-4d55-bc8e-944a80d35598",
"metadata": {},
"outputs": [],
"source": [
"from itertools import chain\n",
"from scipy.interpolate import interp1d\n",
"from pathlib import Path\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "c56bd56d-a264-4e11-a7ec-e3bb62268e42",
"metadata": {},
"source": [
"## Route 1: Measure Success Rate by Model Generation"
]
},
{
"cell_type": "markdown",
"id": "c8bb1426-af24-4159-b1c4-55e9c203bfde",
"metadata": {},
"source": [
"## Round 2: Assess workflow outcomes w/o retraining\n",
"Show that it gets better"
]
},
{
"cell_type": "markdown",
"id": "10478466-dfd8-41c0-bf53-20dcb394fb91",
"metadata": {},
"source": [
"### Get the \"Stable Found\" at 90 minutes\n",
"Loop over all runs and store: scale, if retrained or not, and the number of stable found after 90 minutes. \n",
"The 450-node run switches how it trained DiffLinker at around 90 minutes, and we don't want to study that effect yet."
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "aa46b598-7a7f-4f13-8a7c-c991ef4e2013",
"metadata": {},
"outputs": [],
"source": [
"hours = 1.5"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "98400659-f017-4b68-8ac7-1e9643c10c65",
"metadata": {},
"outputs": [],
"source": [
"success_data = []\n",
"for path in chain(Path('summaries').glob('*-nodes.csv.gz'), Path('summaries').glob('*-nodes_repeat-*.csv.gz'), Path('summaries').glob('*no-retrain*.csv.gz')):\n",
" # Get metadata\n",
" count = int(path.name.split(\"-\")[0])\n",
" retrain = 'no-retrain' not in path.name\n",
"\n",
" # Pull the success rate\n",
" mofs = pd.read_csv(path)\n",
" num_found = interp1d(mofs['walltime'], mofs['cumulative_found'], kind='previous')(hours * 3600).item()\n",
"\n",
" success_data.append({\n",
" 'nodes': count,\n",
" 'retrain': retrain,\n",
" 'found': num_found,\n",
" 'found_node-hr': num_found / (count * hours)\n",
" })\n",
"success_data = pd.DataFrame(success_data)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "ac0c533f-f071-4e74-95d1-77fb0a9e17d9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>found</th>\n",
" <th>found_node-hr</th>\n",
" </tr>\n",
" <tr>\n",
" <th>nodes</th>\n",
" <th>retrain</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">32</th>\n",
" <th>False</th>\n",
" <td>133.0</td>\n",
" <td>2.770833</td>\n",
" </tr>\n",
" <tr>\n",
" <th>True</th>\n",
" <td>313.0</td>\n",
" <td>6.520833</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">64</th>\n",
" <th>False</th>\n",
" <td>426.5</td>\n",
" <td>4.442708</td>\n",
" </tr>\n",
" <tr>\n",
" <th>True</th>\n",
" <td>641.0</td>\n",
" <td>6.677083</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128</th>\n",
" <th>True</th>\n",
" <td>1622.0</td>\n",
" <td>8.447917</td>\n",
" </tr>\n",
" <tr>\n",
" <th>256</th>\n",
" <th>True</th>\n",
" <td>3633.0</td>\n",
" <td>9.460938</td>\n",
" </tr>\n",
" <tr>\n",
" <th>450</th>\n",
" <th>True</th>\n",
" <td>6554.0</td>\n",
" <td>9.709630</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" found found_node-hr\n",
"nodes retrain \n",
"32 False 133.0 2.770833\n",
" True 313.0 6.520833\n",
"64 False 426.5 4.442708\n",
" True 641.0 6.677083\n",
"128 True 1622.0 8.447917\n",
"256 True 3633.0 9.460938\n",
"450 True 6554.0 9.709630"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"success_data.groupby(['nodes', 'retrain']).mean()"
]
},
{
"cell_type": "markdown",
"id": "a973a7cb-e161-429e-9707-4f6b6a608bd2",
"metadata": {},
"source": [
"TBD: Make a plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "269787d5-5319-47f7-83ce-1a807bf14583",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified scripts/evaluate-generated-mofs/figures/stability-over-time.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit e26cd8d

Please sign in to comment.