Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Implementation of R4 method available in Causal-cmd? #69

Open
f-arab opened this issue Jun 9, 2022 · 45 comments
Open

Is Implementation of R4 method available in Causal-cmd? #69

f-arab opened this issue Jun 9, 2022 · 45 comments

Comments

@f-arab
Copy link

f-arab commented Jun 9, 2022

Hi,

I'm planning to use the "R4" method but I couldn't find it on neither Tetrad GUI or Causal-cmd. The "introduction" part in https://bd2kccd.github.io/docs/causal-cmd/ says that R4 is among the available algorithms, however, when I check the algorithms in causal-cmd it only shows the following list:
Algorithm: fas, fask, fci, fcimax, fges, fges-mb, fofc, ftfc, gfci, glasso, imgs_cont, imgs_disc, lingam, mbfs, mgm, multi-fask, pc-all, r-skew, r3, rfci, skew, ts-fci, ts-gfci, ts-imgs

Would you please tell me where can I find the implementation for R4 method?

Thanks

@jdramsey
Copy link
Contributor

jdramsey commented Jun 9, 2022

The R4 method was deprecated and removed. What is your application? Perhaps I could recommend a better algorithm for your purpose?

@jdramsey
Copy link
Contributor

jdramsey commented Jun 9, 2022

Also it's looks like you're using an old version of Tetrad currently. This is a bad website but it does list the up to date version.

https://sites.google.com/view/tetradcausal

The Tetrad GUI there is up to date; tetrad-cmd will be up to date within a week I think.

@f-arab
Copy link
Author

f-arab commented Jun 9, 2022

Thank you for your quick response. Our application is the causal discovery from large-scale fMRI data. We have task fMRI data and therefore not very long recordings, but would like to do it across the whole brain, so we preferably (though not necessarily) need something that can handle tens to hundreds of nodes. Thanks

@jdramsey
Copy link
Contributor

jdramsey commented Jun 9, 2022

Honestly, ever since I hooked GitHub up into Slack I see queries from people almost right away.

Have you seen this paper?

https://direct.mit.edu/netn/article/3/2/274/2211/Estimating-feedforward-and-feedback-effective

We did this work a few years ago, but the Two-Step algorithm in particular uses similar ideas to R4 and has pretty good performance. The FASK algorithm is in Tetrad and has good performance for fMRI as well.

@f-arab
Copy link
Author

f-arab commented Jun 10, 2022

Great! Yes, we have looked at the FASK method as well which is very promising, however, we haven't evaluated the results of our data yet. Thanks for the recommendation, I'll try the Two-Step algorithm as well (it seems that the MATLAB implementation is available).

@f-arab
Copy link
Author

f-arab commented Dec 13, 2022

Hi @jdramsey ,

I have a question about the runtime of the FASK method on large graphs such as simulated Macaque data In the FASK paper. When we try FASK on the Full Macaque data, it takes a few hours as opposed to a few minutes mentioned in the FASK paper although our machine is quite powerful. I am running FASK with the parameters' default values except for the alpha parameter, which we use as a hyperparameter. I would like to ask if any other parameters such as penalty discount (c) also play a role in the speed here and if so, how can we set that one while running FASK on causal-cmd?

Thanks

@jdramsey
Copy link
Contributor

jdramsey commented Dec 14, 2022 via email

@f-arab
Copy link
Author

f-arab commented Dec 14, 2022 via email

@rubenSaro
Copy link

Hi Fahimeh,

Sorry for the late response. Can you let me know the exact values of the parameters you are using for FASK? Also, how are you selecting alpha as a hyperparameter.
Thanks.

Ruben

@f-arab
Copy link
Author

f-arab commented Dec 21, 2022 via email

@rubenSaro
Copy link

So you are using the default parameters of causal-cmd. @jdramsey, would you happen to know which are those. I made a test using the Tetrad GUI (latest version from here: https://s01.oss.sonatype.org/content/repositories/releases/io/github/cmu-phil/tetrad-gui/7.1.0/tetrad-gui-7.1.0-launch.jar)
And using an alpha of 10^-7, it worked in seconds. I wonder what could be the problem.
Can you send us an example of the code you are running. and the tetrad you are using.
thanks

ruben

@f-arab
Copy link
Author

f-arab commented Dec 21, 2022 via email

@f-arab
Copy link
Author

f-arab commented Dec 21, 2022 via email

@jdramsey
Copy link
Contributor

jdramsey commented Dec 21, 2022 via email

@f-arab
Copy link
Author

f-arab commented Dec 21, 2022 via email

@jdramsey
Copy link
Contributor

jdramsey commented Dec 22, 2022 via email

@rubenSaro
Copy link

rubenSaro commented Dec 22, 2022 via email

@f-arab
Copy link
Author

f-arab commented Dec 22, 2022 via email

@jdramsey
Copy link
Contributor

jdramsey commented Dec 22, 2022 via email

@jdramsey
Copy link
Contributor

jdramsey commented Dec 22, 2022 via email

@f-arab
Copy link
Author

f-arab commented Dec 22, 2022 via email

@jdramsey
Copy link
Contributor

jdramsey commented Dec 22, 2022 via email

@jdramsey
Copy link
Contributor

@f-arab @rubenSaro I might be of more help on this in a few days; I just downloaded that macaque data and have been playing with it.

@f-arab
Copy link
Author

f-arab commented Dec 23, 2022

Thanks, @jdramsey. This new version accepts the --default option. I'll let you know if I get the expected speed up after adding this option for FASK.

No worries. Hope you feel better soon. Just let me know whenever you get a chance to run my code.

@jdramsey
Copy link
Contributor

jdramsey commented Dec 23, 2022

Great! I will try your code. I've been mainly playing with the examples in the Tetrad interface, and I'm getting a pretty good idea how to get it to work, so hopefully that can be translated back into parameters for causal-cmd. (I got that idea from @rubenSaro above.) There have been a number changes to Tetrad since that article was published, but the main issue for reproducibility I think is that I don't know which exact concatenations of the 500-variable single datasets were used to produce that table in the article. (@rubenSaro Do you know?) The Box site gives 60 10-dataset concatenations (5000 variables each), but I think that table uses only one for each of 10-, 20- etc. dataset concatenations. So I don't know if exact reproducibility is possible, though one could get a general idea I think.

I'll go back through the article to see which exact parameters were being used. I know they were being tuned. When I do my own tuning manually I can actually get better results than were reported in that table for FASK for the 10-dataset concatenations, but my surmise is that I'm not using exactly the same concatenation, same problem as I just said above.

For instance with one of the concatenations (#34) for the long-range Macaque simulations, if I use FAS Stable with a p-value cutoff of 1e-7, with FASK 1 with a bias of -0.1 (the standard bias for FASK1--that is, the original FASK which I think was used in the article) I get these numbers:

  AP          0.95   Adjacency Precision                                   
  AR          0.401  Adjacency Recall                                      
  AHP         0.95   Arrowhead precision                                   
  AHR         0.354  Arrowhead recall                                      
  AHPC        1      Arrowhead precision (common edges)                    
  AHRC        0.76   Arrowhead recall (common edges)    

This comes back in less than a second, and the adjacency recall is better than what is reported in that table in the Feedbacks article. But mileage can vary.

(Nevertheless obviously with careful tuning Two Step got much better recall for the concatenation used in the article.)

So what I need to do then is set this up in causal-cmd and see how it works I think.

But like I said for reproducibility we definitely need @rubenSaro 's input.

@f-arab
Copy link
Author

f-arab commented Dec 29, 2022

Thanks, @jdramsey. I really appreciate your time and help on this.

@rubenSaro
Copy link

Hi @f-arab,

Any improvement in the running time?

Sorry, I could not see the image you posted about running time. Can you send it again?

Given the results shown by the three of us, I am inclined to think that the problem is coming from the high alpha values used in your experiment. My hypothesis is that the combination of large sample size and high alpha may be resulting in dense graphs (not a lot of edges removed in each pass of FAS) and thus resulting in a very large number of conditioning sets that need to be tested.
Could you just make one run with an alpha of 1e-7 (p-value cutoff)?

Ruben

@f-arab
Copy link
Author

f-arab commented Jan 5, 2023

HI @rubenSaro ,

Yes, you are right, When I'm changing the alpha value for macaque full data to 1e-7, that one becomes so fast as well in Tetrad. The reason that I tried macaque full data with alpha = 0.1 is that in your paper, it is mentioned that alpha = 0.1 gave the best results for that data, With small values of alpha, I see that FASK is super fast (when I'm running it through Tetrad), the precision very high, however, the recall is very small. So maybe, that's why you also tried higher values of alpha to increase the recall for the full macaque graph as well.

There is still one thing that I don't understand which is why the execution times are much higher when I'm using FASK with these hyperparameters on causal-cmd?!

By the way, is there any way of using Tetrad through a script or it should be done manually through the GUI? let's say we want to run FASK on Tetrad for all 60 repetitions of Full macaque data. What is the best way in that case?

Regarding the timing, I got for FASK while using causal-cmd for different values of alpha: (here are the results for data from simple networks 1 to 9)

image

@f-arab
Copy link
Author

f-arab commented Jan 5, 2023

Hi @jdramsey ,

I tried running FASK through Tetrad with all the hyperparameters you mentioned for repetition #34 of long-range macaque data and I got the same results as you have shown and it was also very fast, Thanks

@rubenSaro
Copy link

rubenSaro commented Jan 5, 2023 via email

@rubenSaro
Copy link

rubenSaro commented Jan 5, 2023 via email

@kvb2univpitt
Copy link
Contributor

If you're using causal-cmd version >= 1.4.1, use --default switch to use the default parameter values set in Tetrad. You can still override specific parameters with this switch.

@f-arab
Copy link
Author

f-arab commented Jan 6, 2023 via email

@f-arab
Copy link
Author

f-arab commented Jan 6, 2023

Thanks @rubenSaro . I'm running causal-cmd 1.4.1

@rubenSaro
Copy link

rubenSaro commented Jan 20, 2023 via email

@rubenSaro
Copy link

rubenSaro commented Jan 20, 2023 via email

@f-arab
Copy link
Author

f-arab commented Jan 26, 2023

Thanks, @rubenSaro. Oh, I see! I tried it now with alpha= 0.1 for the orientation part and 1e-7 for the FAS-stable and it is now fast (I actually tried it on Tetrad and I have to also see how to set these correctly on causal- cmd). Anyways, thanks for the clarification.

@jdramsey
Copy link
Contributor

jdramsey commented Feb 8, 2023

@f-arab @rubenSaro Is this issue closed? Can I close it?

@rubenSaro
Copy link

rubenSaro commented Feb 8, 2023 via email

@rubenSaro
Copy link

Hi @f-arab,

Sorry for the late response.

I was wondering how the implementation is going and also curious what is the overall goal of this work. As algorithm developers we are always interested in how these tools are used and how we can help users to make the most of them.

Also, I talked with @jdramsey some weeks ago about the differences between the most recent implementation of FASK and what are the new parameter names that map to those used in the paper implementation.

I highly recommend to check the FASK algorithm and all its parts here: https://tinyurl.com/2ku26aen
It is the supplementary material in the paper.
I refer to the algorithms that compose FASK in the notes below.

This is the list of parameters that FASK use in the command-line version you were using.


faskAdjacencyMethod: 1 # this run FAS-Stable (the one used in the paper). See Algorithm 2.

depth: -1. # control the size of the conditional set in the independence tests, setting this to a small integer may reduce the running time, but can also result in false positives. -1 means that it will check "all" possible sizes.

test: sem-bic-test # test for FAS adjacency

score: sem-bic-score

semBicRule: 1 # to set the Chickering Rule, used in the original Fask

penaltyDiscount: 2 # if using sem-bic as independence test (as in the paper). In the paper this is referred as c. Check step 1 and 10 in Algorithm 2 FAS stable.

skewEdgeThreshold: 0.3 # See description of Fask algorithm, and step 11 in Algorithm 1 FASK. Threshold to add edges that may have been non-inferred because there was a positive/negative cycle that result in a non-zero observed relation.

faskLeftRightRule: 1 # this run FASK v1, the original FASK from the paper

faskDelta: -0.3 # See step 1 and 11 in Algorithm 4 (this is the value set in the paper)

twoCycleScreeningThreshold: 0 # not used in the original paper implementation. Added afterwards. You can set it to 0.3, for example, to use it as a filter to run Algorithm 3 2-cycle detection, which may take some time to run.

orientationAlpha: 0.1 # this was referred in the paper as TwoCycle Alpha or just alpha, the lower it is, the lower the chance of inferring a two cycle. Check steps 17 to 28 in Algorithm 3: 2 Cycle Detection Rule.

structurePrior: 0 # prior on the number of parents. Not used in the paper implementation.

So a run of command line would look like this:

java -jar -Xmx10G causal-cmd-1.4.1-jar-with-dependencies.jar --delimiter tab --data-type continuous --dataset concat_BOLDfslfilter_60_FullMacaque.txt --prefix Fask_Test_MacaqueFull --algorithm fask --faskAdjacencyMethod 1 --depth -1 --test sem-bic-test --score sem-bic-score --semBicRule 1 --penaltyDiscount 2 --skewEdgeThreshold 0.3 --faskLeftRightRule 1 --faskDelta -0.3 --twoCycleScreeningThreshold 0 --orientationAlpha 0.1 -structurePrior 0

Hope this helps

@rubenSaro

@jdramsey
Copy link
Contributor

@rubenSaro I'm going to put these param docs into the Fask docs...

@jdramsey
Copy link
Contributor

@f-arab @rubenSaro, Can this issue be closed yet? I wasn't sure if you were still working on it.

@rubenSaro
Copy link

rubenSaro commented Mar 16, 2023 via email

@f-arab
Copy link
Author

f-arab commented Mar 28, 2023

Thanks @rubenSaro and @jdramsey . Sorry for my late reply. I'll try with the new parameter values and let you know how it goes. In the previous setting I was not able to run FASK (computationally) on a large graphs while setting the alpha to 0.1 which made the graphs denser. I'll try with these new settings.

I'm good to close it as well. I might come back at some point again :)

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants