Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed difference b/w TetradSearch class and using jpype directly #35

Open
samblechman opened this issue Oct 8, 2024 · 5 comments
Open

Comments

@samblechman
Copy link

To get familiar with using Py-Tetrad, I wanted to run GRaSP-FCI on simulated data. Stealing bits of code from jpype_example.py, I did something like:

D, G = simulateLeeHastie(num_meas=10, samp_size=500)
import edu.cmu.tetrad.search as ts_
test = ts_.test.IndTestConditionalGaussianLrt(D, 0.01, True)
score = ts_.score.DegenerateGaussianScore(D, True)
fci = ts_.GraspFci(test, score)
G = fci.search()
# then compare G_ to G...

This runs extremely quickly (< 1 second). However, when I use the TetradSearch class (TetradSearch.py), the computation time increases substantially using the same data:

D, G = simulateLeeHastie(num_meas=10, samp_size=200)
df = tr.tetrad_data_to_pandas(D)
import tools.TetradSearch as ts
search = ts.TetradSearch(df)
search.use_conditional_gaussian_test(alpha=0.01)
search.use_degenerate_gaussian_score(penalty_discount=1)
G_ = search.run_grasp_fci()

I believe this is conceptually equivalent to the previous example but it takes 2-3 orders of magnitude longer. If I use a discrete dataset, it runs rather quickly and in a very similar amount of time as using tetrad.search.

I am looking for help in understanding why this difference may arise. Additionally, are there functionalities present when using the TetradSearch class that are not available in tetrad.search, or vice versa?

Thank you.

@jdramsey
Copy link
Collaborator

jdramsey commented Oct 8, 2024

Well for one thing, in the first case you're converting the data from Tetrad to Python and back again... I wonder if that could account for it.

@bja43
Copy link
Collaborator

bja43 commented Oct 14, 2024

I see that you are using a mixed data-type simulation. Is it possible that the datatypes of your variables are getting messed up? Maybe in the second case some of the continuous variables are being treated as discrete?

@samblechman
Copy link
Author

@jdramsey Just to clarify, using jpype directly is "converting data from Tetrad to Python and back again" or is that what using the TetradSearch.py class amounts to?

@bja43 Interesting thought. The slow down when using the TetradSearch.py class doesn't occur when using just discrete data, but does in the mixed case. If mixed data are being treated as discrete it would be faster, right?

@bja43
Copy link
Collaborator

bja43 commented Oct 14, 2024

@samblechman I would expect a slowdown to occur if one or more continuous variable(s) were being treated as discrete. For instance, if a continuous column in the data with 500 instances is treated as a discrete variables with 500 unique categories, that would probably be much slower.

@bja43
Copy link
Collaborator

bja43 commented Oct 14, 2024

To be clear, I'm not sure if this is the issue, just something to consider!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants