Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How to perform backpropagation for a conv + sigmoid layer? #113

Open
zhewenhu opened this issue Sep 24, 2024 · 3 comments
Open

Comments

@zhewenhu
Copy link

Hi,

I have implemented the forward pass using a convolution + sigmoid_fwd activation and am now working on the backpropagation of the graph. However, according to the document, a graph of sigmoid_bwd + dgrad/wgrad is not supported. I also tried to build this graph but got the error: No valid engine configs for SIGMOID_BWD_ConvBwdData_. Does cuDNN offer any alternatives or methods for implementing this backpropagation?

Here is my code for fprop:

graph_fwd = std::make_shared<fe::graph::Graph>();
graph_fwd->set_io_data_type(fe::DataType_t::FLOAT)
    .set_intermediate_data_type(fe::DataType_t::FLOAT)
    .set_compute_data_type(fe::DataType_t::FLOAT);

X = graph_fwd->tensor(fe::graph::Tensor_attributes()
                    .set_name("input")
                    .set_dim({n, c, h, w})
                    .set_stride({c * h * w, 1, c * w, c}));

W = graph_fwd->tensor(fe::graph::Tensor_attributes()
                    .set_name("weight")
                    .set_dim({k, c, r, s})
                    .set_stride({c * r * s, 1, c * s, c}));

auto conv_options =
    fe::graph::Conv_fprop_attributes().set_padding({0, 0}).set_stride({1, 1}).set_dilation({1, 1});
conv_output = graph_fwd->conv_fprop(X, W, conv_options);

auto sigmoid_options = fe::graph::Pointwise_attributes().set_mode(fe::PointwiseMode_t::SIGMOID_FWD);
Y = graph_fwd->pointwise(conv_output, sigmoid_options);

conv_output->set_output(true);
Y->set_output(true);

And the code for dgrad I attempted but got error No valid engine configs for SIGMOID_BWD_ConvBwdData_:

graph_d_bwd = std::make_shared<fe::graph::Graph>();
graph_d_bwd->set_io_data_type(fe::DataType_t::FLOAT)
    .set_intermediate_data_type(fe::DataType_t::FLOAT)
    .set_compute_data_type(fe::DataType_t::FLOAT);

dY = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("grad")
                        .set_dim({n, k, h, w})
                        .set_stride({k * h * w, 1, k * w, k}));

W_bwd = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("weight")
                        .set_dim(W->get_dim())
                        .set_stride(W->get_stride()));

conv_output_bwd = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("conv_output")
                        .set_dim(conv_output->get_dim())
                        .set_stride(conv_output->get_stride()));

auto dsigmoid_options = fe::graph::Pointwise_attributes().set_mode(fe::PointwiseMode_t::SIGMOID_BWD);
auto dsigmoid_output = graph_d_bwd->pointwise(dY, conv_output_bwd, dsigmoid_options);
dsigmoid_output->set_dim({n, k, h, w});

auto dgrad_options = fe::graph::Conv_dgrad_attributes().set_padding({0, 0}).set_stride({1, 1}).set_dilation({1, 1});
dX = graph_d_bwd->conv_dgrad(dsigmoid_output, W_bwd, dgrad_options);
dX->set_dim({n, c, h, w}).set_output(true);
@Anerudhan
Copy link
Collaborator

Hi @zhewenhu ,

Thanks for posting this. Unfortunately, cudnn does not support the backward graph pattern.

Instead, the suggestion is to split it into two graphs. One that does dSigmoid and other that does dgrad.

Let us know if you have specific use case in mind.

Thanks

@zhewenhu
Copy link
Author

zhewenhu commented Sep 24, 2024

Hi @Anerudhan ,

I also tried splitting them, but Sigmoid alone is also not supported, and I got the same error: No valid engine configs for SIGMOID_BWD_. Could you check if I did something wrong?

Here is the code:

graph_d_bwd = std::make_shared<fe::graph::Graph>();
graph_d_bwd->set_io_data_type(fe::DataType_t::FLOAT)
    .set_intermediate_data_type(fe::DataType_t::FLOAT)
    .set_compute_data_type(fe::DataType_t::FLOAT);

dY = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("grad")
                        .set_dim({n, k, h, w})
                        .set_stride({k * h * w, 1, k * w, k}));

conv_output_bwd = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("conv_output")
                        .set_dim(conv_output->get_dim())
                        .set_stride(conv_output->get_stride()));

auto dsigmoid_options = fe::graph::Pointwise_attributes().set_mode(fe::PointwiseMode_t::SIGMOID_BWD);
auto dsigmoid_output = graph_d_bwd->pointwise(dY, conv_output_bwd, dsigmoid_options);
dsigmoid_output->set_dim({n, k, h, w}).set_output(true);

@Anerudhan
Copy link
Collaborator

Hi @zhewenhu ,

I just took a look at this on H100. And this code seems to be passing. Do you know which GPU you are running this on?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants