Speed difference b/w TetradSearch class and using jpype directly #35

samblechman · 2024-10-08T20:05:09Z

To get familiar with using Py-Tetrad, I wanted to run GRaSP-FCI on simulated data. Stealing bits of code from jpype_example.py, I did something like:

D, G = simulateLeeHastie(num_meas=10, samp_size=500)
import edu.cmu.tetrad.search as ts_
test = ts_.test.IndTestConditionalGaussianLrt(D, 0.01, True)
score = ts_.score.DegenerateGaussianScore(D, True)
fci = ts_.GraspFci(test, score)
G = fci.search()
# then compare G_ to G...

This runs extremely quickly (< 1 second). However, when I use the TetradSearch class (TetradSearch.py), the computation time increases substantially using the same data:

D, G = simulateLeeHastie(num_meas=10, samp_size=200)
df = tr.tetrad_data_to_pandas(D)
import tools.TetradSearch as ts
search = ts.TetradSearch(df)
search.use_conditional_gaussian_test(alpha=0.01)
search.use_degenerate_gaussian_score(penalty_discount=1)
G_ = search.run_grasp_fci()

I believe this is conceptually equivalent to the previous example but it takes 2-3 orders of magnitude longer. If I use a discrete dataset, it runs rather quickly and in a very similar amount of time as using tetrad.search.

I am looking for help in understanding why this difference may arise. Additionally, are there functionalities present when using the TetradSearch class that are not available in tetrad.search, or vice versa?

Thank you.

The text was updated successfully, but these errors were encountered:

jdramsey · 2024-10-08T21:37:06Z

Well for one thing, in the first case you're converting the data from Tetrad to Python and back again... I wonder if that could account for it.

bja43 · 2024-10-14T20:29:42Z

I see that you are using a mixed data-type simulation. Is it possible that the datatypes of your variables are getting messed up? Maybe in the second case some of the continuous variables are being treated as discrete?

samblechman · 2024-10-14T20:33:30Z

@jdramsey Just to clarify, using jpype directly is "converting data from Tetrad to Python and back again" or is that what using the TetradSearch.py class amounts to?

@bja43 Interesting thought. The slow down when using the TetradSearch.py class doesn't occur when using just discrete data, but does in the mixed case. If mixed data are being treated as discrete it would be faster, right?

bja43 · 2024-10-14T23:05:00Z

@samblechman I would expect a slowdown to occur if one or more continuous variable(s) were being treated as discrete. For instance, if a continuous column in the data with 500 instances is treated as a discrete variables with 500 unique categories, that would probably be much slower.

bja43 · 2024-10-14T23:06:52Z

To be clear, I'm not sure if this is the issue, just something to consider!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed difference b/w TetradSearch class and using jpype directly #35

Speed difference b/w TetradSearch class and using jpype directly #35

samblechman commented Oct 8, 2024

jdramsey commented Oct 8, 2024

bja43 commented Oct 14, 2024

samblechman commented Oct 14, 2024

bja43 commented Oct 14, 2024

bja43 commented Oct 14, 2024

Speed difference b/w TetradSearch class and using jpype directly #35

Speed difference b/w TetradSearch class and using jpype directly #35

Comments

samblechman commented Oct 8, 2024

jdramsey commented Oct 8, 2024

bja43 commented Oct 14, 2024

samblechman commented Oct 14, 2024

bja43 commented Oct 14, 2024

bja43 commented Oct 14, 2024