Best number of topic always goes to maximum of the topics range #107
-
Hello, I am running pycisTopic on a set of scATAC-seq datasets (a developmental time course dataset with 8 points). pycisTopic is ran individually for each of these timepoints, and for all timepoints the range of topics selected goes from 2 to 150 topics. My problem is that for all datasets, it always ends up selecting the 150 topics model which is weird (I would expect early dev timepoints to have lower complexity that later timepoints). I am attaching a couple of metrics plots (from the earliest and the latest timepoints) to give you an idea. Do you have an idea on what I can do to improve this? Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @simozhou Indeed, the automatic selection criteria is not the best one. It just takes the point where all metrics are maximised, this is often the case for the model with the highest number of topics. In practice we select the best model manually based on where the curves start flattening (i.e. the model with the lowest number of topics, where most metrics are (semi) maximised). For example in your 18hours screenshot I would select a model with ~50 topics. All the best, Seppe |
Beta Was this translation helpful? Give feedback.
Hi @simozhou
Indeed, the automatic selection criteria is not the best one. It just takes the point where all metrics are maximised, this is often the case for the model with the highest number of topics.
In practice we select the best model manually based on where the curves start flattening (i.e. the model with the lowest number of topics, where most metrics are (semi) maximised).
For example in your 18hours screenshot I would select a model with ~50 topics.
All the best,
Seppe