Skip to content

Commit

Permalink
Bugfix adaptive adjacency
Browse files Browse the repository at this point in the history
* Previously used rand instead of randn, so there were no negative dot products
* Softmax doesn't preserve 0 values, so change to normalize L1.
  • Loading branch information
boykovdn committed Nov 9, 2024
1 parent bf4c5e4 commit 0a883b6
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 4 deletions.
2 changes: 1 addition & 1 deletion experiments/reproduction.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ def train():
tb_logger = pl_loggers.TensorBoardLogger(save_dir="logs/")
trainer = pl.Trainer(
accelerator=device,
max_steps=100000,
max_steps=10000, # 100000,
limit_val_batches=1, # TODO Debugging
gradient_clip_val=5.0, # TODO There was something about this in the code.
logger=tb_logger,
Expand Down
24 changes: 24 additions & 0 deletions quarto/notes.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,30 @@ Not sure off the top of my head where the massive memory footprint comes from, m
- [ ] Visualise the adaptive adjacency evolution using nx in tensorboard.
- [ ] Add PEMS-BAY dataset.

It looks like the dense_to_sparse function takes up a lot of time.
Potentially I can mess around trying to optimise the operation?
In the original implementation they don't work with sparse matrices and thus don't have the issues I'm having.
But the sparse matrices are central to working with PyG, and with the selling point of scalability.
It would be fun to try to build the CUDA module for sparsification, but for now the easiest solution might be to manually construct the matrix.
Notice that I'm sparsifying a matrix of size batch_size * N, and the 0 entries scale as batch_size^2.
Ok, so it seems that there is a Pytorch Issue (https://github.com/pytorch/pytorch/issues/31942) addressing block_diag sparse tensors.
It would likely be better to work on that instead, and it should make my implementation faster anyway.
No way forward without fixing that issue!

Actually, couldn't you:
- Take powers before block-diagonalisation, then block-diag it.
- Yes, but you still have to sparsify a dense block-diag matrix, which is the bottleneck.
- Use torch.sparse.mm to compute the powersafter sparsification?
- Yes, but that doesn't solve constructing the sparse block-diag.

How about using the torch geometric collation function to build the sparse matrix as a batch?

Well, turns out the issue is not with sparsifying the matrix.
It's to do with the size of the 207x207 adjacency - using softmax directly removes the 0s, hence I've been working with a dense matrix.
Changing softmax for normalize, and initialising using a Gaussian removes about 50% of the values, and speeds up the computation.
Question is, does the adaptive adjacency tend towards sparsity on its own?

Well, it seems to run at a more acceptable rate, and it outperforms the no adjacency case.

### References

Expand Down
6 changes: 3 additions & 3 deletions src/gwnet/model/gwnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ def __init__(
raise Exception(adp_err_msg)

self.node_embeddings = torch.nn.Parameter(
torch.rand(n_nodes, adaptive_embedding_dim)
torch.randn(n_nodes, adaptive_embedding_dim)
)
adp = True

Expand Down Expand Up @@ -310,8 +310,8 @@ def _update_adp_adj(self, batch_size: int, k_hops: int) -> None:
self.global_elements["adj_weights"] = {}

# (N, C) @ (C, N) -> (N, N)
adp_adj = F.softmax(
F.relu(self.node_embeddings @ self.node_embeddings.T), dim=1
adp_adj = F.normalize(
F.relu(self.node_embeddings @ self.node_embeddings.T), dim=1, p=1
)
adp_adj_dense_batch = torch.block_diag(*[adp_adj] * batch_size)

Expand Down

0 comments on commit 0a883b6

Please sign in to comment.