-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow passing in cr/cl bounds and other settings #6
base: main
Are you sure you want to change the base?
Conversation
Allow CPU execution. Fix GPU support. Fix module loading.
@@ -22,7 +22,7 @@ We need to put the data sets in the `dataset` folder. You can specify one data s | |||
|
|||
```bash | |||
# trained on the tic-tac-toe data set with one GPU. | |||
python3 experiment.py -d tic-tac-toe -bs 32 -s 1@16 -e401 -lrde 200 -lr 0.002 -ki 0 -mp 12481 -i 0 -wd 1e-6 & | |||
python3 experiment.py -d tic-tac-toe -bs 32 -s 1@16 -e401 -lrde 200 -lr 0.002 -ki 0 -mp 12481 -i cuda:0 -wd 1e-6 & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: see review comment on args.py
changes
@@ -51,7 +52,8 @@ | |||
rrl_args.plot_file = os.path.join(rrl_args.folder_path, 'plot_file.pdf') | |||
rrl_args.log = os.path.join(rrl_args.folder_path, 'log.txt') | |||
rrl_args.test_res = os.path.join(rrl_args.folder_path, 'test_res.txt') | |||
rrl_args.device_ids = list(map(int, rrl_args.device_ids.strip().split('@'))) | |||
rrl_args.device_ids = list(map(lambda id: torch.device(id), rrl_args.device_ids.strip().split('@'))) \ | |||
if rrl_args.device_ids else [None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I found that passing in integer device ID would get the tensors pegged to the GPU memory but the GPU compute utilization remains at 0, as shown by nvidia-smi
. After I change the device ID to that returned by torch.device("cuda:0")
, the GPU is utilized fully. I do not know why that's the case as simple test using a python loop can cause GPU utilization.
Example run passing in integer device ID:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00 Driver Version: 450.142.00 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 47C P0 70W / 149W | 322MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 27173 C ...vs/pytorch_p37/bin/python 319MiB |
+-----------------------------------------------------------------------------+
Example run passing in cuda:*
:
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 27346 C ...vs/pytorch_p37/bin/python 1736MiB |
+-----------------------------------------------------------------------------+
Sat Dec 4 01:31:31 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00 Driver Version: 450.142.00 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 52C P0 138W / 149W | 1739MiB / 11441MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 27346 C ...vs/pytorch_p37/bin/python 1736MiB |
+-----------------------------------------------------------------------------+
# lower_bound: [continuous cols] | ||
# upper_bound: [continuous cols] | ||
} | ||
return settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I added this new setting file so that the user can pass in CR/CL bounds as well as controlling normalization and one-hot encoding etc. (those are currently hard-coded)
if self.left is not None and self.right is not None: | ||
if cl is not None and cr is not None: # bounds are specified | ||
cl = torch.tensor(cl).type(torch.float).t() | ||
cr = torch.tensor(cr).type(torch.float).t() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: here we can pass in the cl/cr bounds directly.
cl = self.left + torch.rand(self.n, self.input_dim[1]) * (self.right - self.left) | ||
cr = self.left + torch.rand(self.n, self.input_dim[1]) * (self.right - self.left) | ||
else: | ||
cl = 3. * (2. * torch.rand(self.n, self.input_dim[1]) - 1.) | ||
cr = 3. * (2. * torch.rand(self.n, self.input_dim[1]) - 1.) | ||
assert torch.Size([self.n, self.input_dim[1]]) == cl.size() | ||
assert torch.Size([self.n, self.input_dim[1]]) == cr.size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: and verify the shapes are correct.
|
||
self.net.cuda(self.device_id) | ||
if self.device_id and self.device_id.type == 'cuda': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the condition allows the program to run in CPU mode as well.
self.feature_enc = preprocessing.OneHotEncoder(categories='auto', drop=drop) | ||
self.imp = SimpleImputer(missing_values=np.nan, strategy='mean') | ||
self.feature_enc = preprocessing.OneHotEncoder(categories='auto', drop=drop) if one_hot_encode_features else None | ||
self.imp = SimpleImputer(missing_values=np.nan, strategy='mean') if impute_continuous else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: for dataset not requiring or already have one-hot encoding or imputation, they can now be skipped.
Thank you very much for the PR. I am busy on other stuff now and will check the code after Dec 9. |
Fix execution on CPU and GPU. Fix model loading.