Unsure on how to create variable-width bins using rebinning algorithm #346
newtonharry
started this conversation in
General
Replies: 1 comment 1 reply
-
I think you may be able to do this if you keep a record of fragments that across the boundary of two bins. When you aggregate the counts, you need to subtract the number of fragments that are in the boundaries. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I recently asked a question that was noticing issues about the total read count between higher and lower resolution bins across the genome: #345
I now understand how different counting strategies, such as paired-insertion and fragment-based (non-insertion) methods, lead to variations in count values depending on the chosen bin resolution. I’m currently exploring how to implement an efficient rebinning function that can adjust bin sizes from lower to higher resolutions while maintaining consistency with the original counting strategy. To do this correctly, access to the fragment sparse matrix would likely be required.
However, I’m curious whether it's possible to construct lower-resolution bins directly from higher-resolution bins without referencing the fragment matrix. In essence, this approach would involve aggregating counts from the smaller bins into larger bins, preserving the viewpoint of the higher resolution. Would such a strategy be feasible?
I've experimented with this concept and observed that my adaptive binning strategy yields higher Average Silhouette Width (ASW) scores. The challenge, however, is that the aggregated counts differ from what SnapATAC2 would generate if the data were processed directly at that lower resolution.
Could it be that the inverse document frequency (IDF) scaling applied during feature weighting is compensating for the increased counts in my approach? Since the aggregated higher-resolution bins inherently have higher counts, IDF might be down-weighting these values to offset the inflation caused by aggregation.
I'm interested to hear your thoughts on this and to see where I'm going wrong/not understanding some fundamental concepts correctly.
Referring to the adaptive binning strategy, I initially depeveloped an iterative solution to aggregate the bins but I've shifted towards using summation matrices via Kronecker dot products:
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions