-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing move to device in influence_model.fit() #569
Comments
@sleepymalc Thank you for reporting. Could you please provide more information, so we can reproduce and fix the issue? |
Sure, the following is a MWE: import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, Sampler
from pydvl.influence.torch import EkfacInfluence
import random
import numpy as np
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Define a simple MLP model
class MLP(nn.Module):
def __init__(self, input_size=784, hidden_size=128, output_size=10, num_layers=2):
super(MLP, self).__init__()
self.flatten = torch.nn.Flatten()
self.layers = torch.nn.ModuleList()
self.layers.append(torch.nn.Linear(input_size, hidden_size))
for _ in range(num_layers - 2):
self.layers.append(torch.nn.Linear(hidden_size, hidden_size))
self.layers.append(torch.nn.Linear(hidden_size, output_size))
self.relu = torch.nn.ReLU()
def forward(self, x):
x = self.flatten(x)
for layer in self.layers[:-1]:
x = self.relu(layer(x))
x = self.layers[-1](x)
return x
def train_with_seed(self, train_loader, epochs=30, seed=0, verbose=True):
torch.manual_seed(seed)
random.seed(seed)
np.random.seed(seed)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(self.parameters(), lr=0.01, momentum=0.9)
for epoch in range(epochs):
running_loss = 0.0
for images, labels in train_loader:
images, labels = images.to(device), labels.to(device)
optimizer.zero_grad()
outputs = self(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if verbose:
print(f"Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}")
print("Training complete")
def test(self, test_loader):
self.eval()
correct = 0
total = 0
# No gradient is needed for evaluation
with torch.no_grad():
for images, labels in test_loader:
images, labels = images.to(device), labels.to(device)
outputs = self(images)
# Get the predicted class from the maximum value in the output-list of class scores
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
accuracy = 100 * correct / total
print(f'Accuracy of the model on the test set: {accuracy:.2f}%')
class SubsetSamper(Sampler):
def __init__(self, indices):
self.indices = indices
def __iter__(self):
return iter(self.indices)
def __len__(self):
return len(self.indices)
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=1, sampler=SubsetSamper(list(range(500))))
test_loader = DataLoader(test_dataset, batch_size=1, sampler=SubsetSamper(list(range(50))))
influence_model = EkfacInfluence(
MLP().to(device),
update_diagonal=True,
hessian_regularization=0.001,
)
influence_model = influence_model.fit(train_loader) When running ---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[3], [line 23](vscode-notebook-cell:?execution_count=3&line=23)
[16](vscode-notebook-cell:?execution_count=3&line=16) test_loader = DataLoader(test_dataset, batch_size=1, sampler=SubsetSamper(list(range(50))))
[18](vscode-notebook-cell:?execution_count=3&line=18) influence_model = EkfacInfluence(
[19](vscode-notebook-cell:?execution_count=3&line=19) MLP().to(device),
[20](vscode-notebook-cell:?execution_count=3&line=20) update_diagonal=True,
[21](vscode-notebook-cell:?execution_count=3&line=21) hessian_regularization=0.001,
[22](vscode-notebook-cell:?execution_count=3&line=22) )
---> [23](vscode-notebook-cell:?execution_count=3&line=23) influence_model = influence_model.fit(train_loader)
File [~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/utils/progress.py:56](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/utils/progress.py:56), in log_duration.<locals>.decorator_log_duration.<locals>.wrapper_log_duration(*args, **kwargs)
[54](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/utils/progress.py:54) duration_logger.log(log_level, f"Function '{func_name}' is starting.")
[55](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/utils/progress.py:55) start_time = time()
---> [56](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/utils/progress.py:56) result = func(*args, **kwargs)
[57](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/utils/progress.py:57) duration = time() - start_time
[58](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/utils/progress.py:58) duration_logger.log(
[59](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/utils/progress.py:59) log_level,
[60](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/utils/progress.py:60) f"Function '{func_name}' completed. " f"Duration: {duration:.2f} sec",
[61](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/utils/progress.py:61) )
File [~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1218](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1218), in EkfacInfluence.fit(self, data)
[1211](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1211) @log_duration(log_level=logging.INFO)
[1212](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1212) def fit(self, data: DataLoader) -> EkfacInfluence:
[1213](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1213) """
[1214](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1214) Compute the KFAC blocks for each layer of the model, using the provided data.
[1215](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1215) It then creates an EkfacRepresentation object that stores the KFAC blocks for
[1216](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1216) each layer, their eigenvalue decomposition and diagonal values.
[1217](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1217) """
-> [1218](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1218) forward_x, grad_y = self._get_kfac_blocks(data)
[1219](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1219) layers_evecs_a = {}
[1220](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1220) layers_evect_g = {}
File [~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1198](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1198), in EkfacInfluence._get_kfac_blocks(self, data)
[1194](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1194) for x, *_ in tqdm(
[1195](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1195) data, disable=not self.progress, desc="K-FAC blocks - batch progress"
[1196](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1196) ):
[1197](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1197) data_len += x.shape[0]
-> [1198](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1198) pred_y = self.model(x)
[1199](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1199) loss = empirical_cross_entropy_loss_fn(pred_y)
[1200](...~/miniconda3/envs/influence/lib/python3.9/site-packages/pydvl/influence/torch/influence_function_model.py:1200) loss.backward()
File [~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1511](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1511), in Module._wrapped_call_impl(self, *args, **kwargs)
[1509](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1509) return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
[1510](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1510) else:
-> [1511](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1511) return self._call_impl(*args, **kwargs)
File [~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1520](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1520), in Module._call_impl(self, *args, **kwargs)
[1515](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1515) # If we don't have any hooks, we want to skip the rest of the logic in
[1516](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1516) # this function, and just call forward.
[1517](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1517) if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
[1518](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1518) or _global_backward_pre_hooks or _global_backward_hooks
[1519](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1519) or _global_forward_hooks or _global_forward_pre_hooks):
-> [1520](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1520) return forward_call(*args, **kwargs)
[1522](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1522) try:
[1523](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1523) result = None
Cell In[2], [line 18](vscode-notebook-cell:?execution_count=2&line=18)
[16](vscode-notebook-cell:?execution_count=2&line=16) x = self.flatten(x)
[17](vscode-notebook-cell:?execution_count=2&line=17) for layer in self.layers[:-1]:
---> [18](vscode-notebook-cell:?execution_count=2&line=18) x = self.relu(layer(x))
[19](vscode-notebook-cell:?execution_count=2&line=19) x = self.layers[-1](x)
[20](vscode-notebook-cell:?execution_count=2&line=20) return x
File [~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1511](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1511), in Module._wrapped_call_impl(self, *args, **kwargs)
[1509](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1509) return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
[1510](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1510) else:
-> [1511](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1511) return self._call_impl(*args, **kwargs)
File [~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1561](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1561), in Module._call_impl(self, *args, **kwargs)
[1558](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1558) bw_hook = hooks.BackwardHook(self, full_backward_hooks, backward_pre_hooks)
[1559](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1559) args = bw_hook.setup_input_hook(args)
-> [1561](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1561) result = forward_call(*args, **kwargs)
[1562](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1562) if _global_forward_hooks or self._forward_hooks:
[1563](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1563) for hook_id, hook in (
[1564](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1564) *_global_forward_hooks.items(),
[1565](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1565) *self._forward_hooks.items(),
[1566](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1566) ):
[1567](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/module.py:1567) # mark that always called hook is run
File [~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/linear.py:116](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/linear.py:116), in Linear.forward(self, input)
[115](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/linear.py:115) def forward(self, input: Tensor) -> Tensor:
--> [116](...~/miniconda3/envs/influence/lib/python3.9/site-packages/torch/nn/modules/linear.py:116) return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm) Hope this helps. |
@sleepymalc thanks, that helped a lot. Please have a look at #570. To install via pip
To test it, I added a small call to the end of your file # had to add this, due to an old cuda version in combination with nan values, try if you neeed it
#torch.backends.cuda.preferred_linalg_library('magma')
influence_model = influence_model.fit(train_loader)
for x_train, y_train in train_loader:
for x_test, y_test in test_loader:
influence_model.influences(x_test, y_test, x_train, y_train)
fac = influence_model.influence_factors(x_test, y_test)
influence_model.influences_from_factors(fac, x_train, y_train)
break
break Please let me know, if this solves the problem. thanks:) |
It seems like the problem is solved! Thanks for the quick fix. |
@sleepymalc awesome, please let us know, if you encounter any other issues, thanks:) |
When using models in "cuda" to construct an instance of influence_model, by calling influence_model.fit(), an error throws out indicating that there are tensors on different devices.
The text was updated successfully, but these errors were encountered: