Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check CUDA out of memory #213

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

thodkatz
Copy link
Collaborator

@thodkatz thodkatz commented Aug 2, 2024

In conjunction with ilastik/ilastik#2891, eventually our goal would be to find the maximum possible tensor shape that can fit to the GPU memory for a simple forward pass.

To accomplish this, tiktorch should provide clients the functionality of doing the memory prompting to the GPU.

Two procedures have been added:

  • IsCudaOutOfMemory(shape) -> bool, checks for a given shape if it fits to the GPU memory
  • MaxCudaMemoryShape(minShape, maxShape, step) -> shape, for a range of [minShape, maxShape] and valid increments of step, returns the maximum shape that fits to the GPU memory.

Exposing the first one, could be still valuable if we have a good guess of a shape, since the second one can be very time consuming, depending on the range.

The implementation of prompting the GPU has been inspired as well by plant-seg.

Note for the review:
The inference_pb2_grpc.py and inference_pb2.py are auto-generated files by make protos.

Copy link
Collaborator

@k-dominik k-dominik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Great that you got into tiktorch so quickly!

I left some minor comments and a question about how we want to see the interface for this functionality (basically ints vs namedints)...

environment.yml Show resolved Hide resolved
proto/inference.proto Outdated Show resolved Hide resolved
tiktorch/server/grpc/inference_servicer.py Show resolved Hide resolved
tiktorch/server/grpc/inference_servicer.py Outdated Show resolved Hide resolved
@thodkatz
Copy link
Collaborator Author

thodkatz commented Aug 6, 2024

hey @k-dominik :)

I have attempted to refactor the design of the ModelInfo and the concept of shapes.

The previous design of the ModelInfo was coupled to the representation of the ModelSession message used for the grpc communication. The new design creates a more feature-rich interface to work easily with the concept of shape, and the compression of the ModelInfo needed to be transferred by the server, is localized (info2session).

With this interface clients also can tranform the data from the grpc channel to a feature-rich object again with session2info().

Do you think that the design is on the right track?

self._min_shape = min_shape
self._steps = steps
assert self._min_shape.is_same_axes(self._steps)
assert all(step == 0 for axis, step in steps if axis not in AxisWithValue.SPATIAL_AXES) # todo: ?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is valid? Does it make sense to allow non zero step values for non spatial axes increments?

Copy link
Collaborator

@k-dominik k-dominik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the refactor of ModelInfo; In general I like the idea of having something more usable directly :), good idea!
With the implementation details for the "rich" nicer to use objects I left comments that mostly leaned towards subclassing dict, which are probably outdated as I'm thinking.... actually, what do you think about investigating if bioimageio.spec classes can be reused for this purpose. I think, at least for me, any removed layer of transformation would reduce cognitive load. On the other hand this would bind us to bioimageio.spec, at least for the metadata. But since that's the format we're supporting (and the only one) I don't see much of a problem.
Of course then I immediately ask myself why even do the dance and not parse the model on the client also, and be done with it (and we already do parse the spec on the client(ilastik) side.)...

Also the ModelInfo (and related convenience classes) would need a very heavy refactor when updating to 0.5 of the spec.

idk, what do you think @thodkatz?

tiktorch/server/session/process.py Outdated Show resolved Hide resolved
tiktorch/server/session/process.py Outdated Show resolved Hide resolved
tiktorch/server/session/process.py Outdated Show resolved Hide resolved
tiktorch/server/session/process.py Outdated Show resolved Hide resolved
tiktorch/server/session/process.py Outdated Show resolved Hide resolved
tiktorch/server/session/process.py Outdated Show resolved Hide resolved
tiktorch/server/session/process.py Outdated Show resolved Hide resolved
tiktorch/server/session/process.py Outdated Show resolved Hide resolved
@thodkatz thodkatz marked this pull request as draft August 7, 2024 12:52
Since we removed ModelInfo interface, the proto buff for InputShape and OutputShape, and their conversions are redundant
Two procedures have been added:
- Get the maximum tensor shape
- Check if a tensor's shape fits to memory
The current interface supports multiple device ids. To check if a cuda memory request is a valid one, meaning that a gpu is detected, a device id is needed to do the check for the available ones if any.
@thodkatz thodkatz force-pushed the check-cuda-out-of-memory branch from 0c1cf06 to 46ac782 Compare August 19, 2024 12:52
@thodkatz
Copy link
Collaborator Author

thodkatz commented Aug 19, 2024

Since the PR of removing the ModelInfo was merged, I have rebased this PR.

  • I have removed some leftover code from the previous ModelInfo interface.
  • I have addressed the comments regarding detection of gpu, with adding a deviceId field to the cuda memory requests and using the function check_gpu_exists.
  • Shapes are consistent across proto buff requests. A NamedInt is expected.
  • Additional tests for invalid request of the MaxCudaMemoryShape.
  • Since we are going to rely to bioimageio spec classes, I have removed any fancy objects, mentioned in the refactoring stage.

@thodkatz thodkatz marked this pull request as ready for review August 19, 2024 13:07
def get_axes_with_size(axes: Tuple[str, ...], shape: Tuple[int, ...]) -> NamedShape:
if len(axes) != len(shape):
raise ValueError(f"{axes} and {shape} incompatible length. It should be equal")
InputTensorValidator.is_shape(shape)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, this check should actually do something


def _check_shape_explicit(self, spec: nodes.InputTensor, tensor_shape: NamedShape):
assert self.is_shape_explicit(spec)
reference_shape = {name: size for name, size in zip(spec.axes, spec.shape)}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use the get_axes_with_size


def _check_shape_parameterized(self, spec: nodes.InputTensor, tensor_shape: NamedShape):
assert isinstance(spec.shape, ParametrizedInputShape)
if not self.is_shape(tensor_shape.values()):
Copy link
Collaborator Author

@thodkatz thodkatz Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is part of the get_axes_with_size that return NamedShape, where we assume a valid named shape, meaning a map of the names of the axes, along with their size (natural number), then we can remove this check from here, and integrate it into the get_axes_with_size. Maybe the Dict[str, int] can be an actual class, not a type, to enforce this. I should maybe have kept the AxesWithValue

@thodkatz
Copy link
Collaborator Author

thodkatz commented Aug 20, 2024

Now that I think about it, we can improve the design:

What we actually need is just a validated xr.DataArray aka tensors, and not really plain shapes. The xr.DataArray can be convenient since it contains already the logic of dims, that should represent the model axes. So we can utilize the Sample to create only validated Samples. That way we keep the validation in one place. Currently, there is no way to infer if a sample is actually valid or not, and we don't actually need valid shapes, but valid tensors.

The xr.DataArray will handle as well all the logic for potentially negative values in the shape, and some checks can be skipped.

The reasoning of needing valid tensors, can be understood by checking the implementations of the proceduers Predict, IsCudaOutofMemory, MaxCudaMemoryShape, that eventually everything is realized as a tensor, to be forwared to the model.

Copy link
Collaborator

@k-dominik k-dominik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this functionality is really great! I can already see us using gpus more efficiently!

In general it would be great to add docstrings to methods, especially if they are some sort of API. Take IsCudaOutOfMemory as an example. Given the name I'd assume that one can query if the current device is out of memory. But is looks like this is intended to check whether a given tensor with a given shape would fit.

Also at least somewhere the scope of what tiktorch supports in terms of models should be noted. When I read your code I think you assume single input tensors (the out of memory functionality should also add any additional tensors the model expects, with default sizes I guess, otherwise there will be errors thrown), do you think that limitation would be hard to lift?

Currently the out of memory tests will also not run on mac due to the model having pytorch state-dict weights (there is a bug currently, that prevents this from working). An easy workaround would be switching to torchscript. I think in current bioimage.spec/core this bug was fixed, but we're not there yet :). (Maybe we should add CI on a different platform, too... not in this PR ;))

I also noticed that there aren't any tests for InputTensorValidator - which would be great to have for completeness.

return client.api.forward(sample)

def _check_gpu_exists(self, client: BioModelClient, device_id: str):
gpu_device_ids = [device.id for device in self.__device_pool.list_devices() if device.id.startswith("cuda")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devices could also be "mps" (on apple silicon).

if len(gpu_device_ids) == 0:
raise ValueError("Not available gpus found")
if device_id not in client.devices:
raise ValueError(f"{device_id} not found for model {client.name}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one additional check would be if device_id in gpu_device_ids?

Comment on lines +82 to +84
max_shape_arr = np.array(list(max_shape.values()))
min_shape_arr = np.array(list(param_shape.min_shape.values()))
step_arr = np.array(list(param_shape.step_shape.values()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always afraid to rely two dicts having the same order... The prior check_same_axes is invariant to order of the keys. I'd probably do something like this with a fixed reference for the order:

Suggested change
max_shape_arr = np.array(list(max_shape.values()))
min_shape_arr = np.array(list(param_shape.min_shape.values()))
step_arr = np.array(list(param_shape.step_shape.values()))
max_shape_arr = np.array([max_shape[k] for k in max_shape])
min_shape_arr = np.array([param_shape.min_shape[k] for k in max_shape])
step_arr = np.array([param_shape.step_shape[k] for k in max_shape])

but maybe that's overdoing it - spec probably guarantees the order?

return max_shape
return None

def _is_cuda_out_of_memory(self, client: BioModelClient, tensor_id: str, shape: NamedShape) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems to assume, that there will be only a single Tensor in the sample... While ilastik can only deal with one, tiktorch probably could deal with multiple inputs.

Comment on lines +248 to +249
MAX_SHAPE = (1, 1, 10, 10)
AXES = ("b", "c", "y", "x")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe note that these values relate to the model being used - otherwise it's a bit hard to follow how MAX_SHAPE is enforced (i was first looking for some monkeypatching somewhere).

@thodkatz
Copy link
Collaborator Author

Great! Thank you very much @k-dominik for the review once more :)

Totally agree with the statements, I will attempt to resolve them. Could you please also have a look on this comment #213 (comment), I think that a little bit of refactoring will remove some redundant checks, and it will be more readable as well :)

@thodkatz
Copy link
Collaborator Author

thodkatz commented Aug 20, 2024

Currently the out of memory tests will also not run on mac due to the model having pytorch state-dict weights (there is a ilastik/ilastik#2827 currently, that prevents this from working). An easy workaround would be switching to torchscript. I think in current bioimage.spec/core this bug was fixed, but we're not there yet :). (Maybe we should add CI on a different platform, too... not in this PR ;))

Regarding this, actually without knowing it, I have attempted to fix the weight conversion functionality of the bioimage core in this PR! But I think that this should indeed be handled by the core, and tiktorch shouldn't know about the weights format. But yeah let's keep the CI on another PR :)

Also at least somewhere the scope of what tiktorch supports in terms of models should be noted. When I read your code I think you assume single input tensors (the out of memory functionality should also add any additional tensors the model expects, with default sizes I guess, otherwise there will be errors thrown), do you think that limitation would be hard to lift?

Yep you are right! I think it is time for the concept of Sample even from the proto buff perspective, so everything will be consistent with spec too :) I would assume that the client should provide a Sample to check the out of memory functionality, so we don't really have to guess what we should do with other tensors if not specified. Although the Sample increases the complexity a lot, because if you have for example two inputs with parameterized shape, and I want to test a range of shapes, then should I create a grid of all the possible combinations, should the shapes be updated synchronously?

@k-dominik
Copy link
Collaborator

Also at least somewhere the scope of what tiktorch supports in terms of models should be noted. When I read your code I think you assume single input tensors (the out of memory functionality should also add any additional tensors the model expects, with default sizes I guess, otherwise there will be errors thrown), do you think that limitation would be hard to lift?

Yep you are right! I think it is time for the concept of Sample even from the proto buff perspective, so everything will be consistent with spec too :) I would assume that the client should provide a Sample to check the out of memory functionality, so we don't really have to guess what we should do with other tensors if not specified. Although the Sample increases the complexity a lot, because if you have for example two inputs with parameterized shape, and I want to test a range of shapes, then should I create a grid of all the possible combinations, should the shapes be updated synchronously?

It's a bit of a pity to reason about this still with the old spec in mind. In the new one as I understood Fynn one would go by reference axes... And I suppose yes, a (hyper-)grid would be the correct solution. But for some reason that sounds too complicated as a result. Poof, this needs more thinking...

@thodkatz thodkatz marked this pull request as draft October 11, 2024 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants