Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column cluster distance vs Row cluster distance #4

Open
PierreLaplante opened this issue Apr 11, 2024 · 1 comment
Open

Column cluster distance vs Row cluster distance #4

PierreLaplante opened this issue Apr 11, 2024 · 1 comment

Comments

@PierreLaplante
Copy link

PierreLaplante commented Apr 11, 2024

Hello Kevin,

Thank you for this tutorial that has been very useful (even 4 years later).

I have a question regarding the cluster distance metric you use, specifically regarding the difference between row and column distance.

You define the following:

clustering_distance_columns = function(x) as.dist(1 - cor(t(x))), clustering_method_columns = 'ward.D2', clustering_distance_rows = function(x) as.dist(1 - cor(t(x))), clustering_method_rows = 'ward.D2',

I understand that, for rows (genes), you use 1 - Pearson correlation of the transposed matrix.

But I see that you use the same formula for the column (sample) clustering.
In the case of columns, shouldn't it be the 1 - Pearson correlation of the matrix itself? e.g:

clustering_distance_columns = function(x) as.dist(1 - cor(x))

I'm new to the field of RNAseq analysis, so forgive me is the question is naive, but I cannot visualise what it means to use the distance of the rows as metrics for the column clustering.

Thank you for your insight, and have a good day.

@kevinblighe
Copy link
Owner

Interesting question. ComplexHeatmap says this about these arguments:

clustering_distance_rows	
It can be a pre-defined character which is in ("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski", "pearson", "spearman", "kendall"). It can also be a function. If the function has one argument, the input argument should be a matrix and the returned value should be a dist object. If the function has two arguments, the input arguments are two vectors and the function calculates distance between these two vectors.


clustering_distance_columns
Same setting as clustering_distance_rows.

If you run with clustering_distance_columns = function(x) as.dist(1 - cor(x)), does it not throw an error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants