-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed difference b/w TetradSearch class and using jpype directly #35
Comments
Well for one thing, in the first case you're converting the data from Tetrad to Python and back again... I wonder if that could account for it. |
I see that you are using a mixed data-type simulation. Is it possible that the datatypes of your variables are getting messed up? Maybe in the second case some of the continuous variables are being treated as discrete? |
@jdramsey Just to clarify, using jpype directly is "converting data from Tetrad to Python and back again" or is that what using the TetradSearch.py class amounts to? @bja43 Interesting thought. The slow down when using the TetradSearch.py class doesn't occur when using just discrete data, but does in the mixed case. If mixed data are being treated as discrete it would be faster, right? |
@samblechman I would expect a slowdown to occur if one or more continuous variable(s) were being treated as discrete. For instance, if a continuous column in the data with 500 instances is treated as a discrete variables with 500 unique categories, that would probably be much slower. |
To be clear, I'm not sure if this is the issue, just something to consider! |
To get familiar with using Py-Tetrad, I wanted to run GRaSP-FCI on simulated data. Stealing bits of code from
jpype_example.py
, I did something like:D, G = simulateLeeHastie(num_meas=10, samp_size=500)
import edu.cmu.tetrad.search as ts_
test = ts_.test.IndTestConditionalGaussianLrt(D, 0.01, True)
score = ts_.score.DegenerateGaussianScore(D, True)
fci = ts_.GraspFci(test, score)
G = fci.search()
# then compare G_ to G...
This runs extremely quickly (< 1 second). However, when I use the TetradSearch class (
TetradSearch.py
), the computation time increases substantially using the same data:D, G = simulateLeeHastie(num_meas=10, samp_size=200)
df = tr.tetrad_data_to_pandas(D)
import tools.TetradSearch as ts
search = ts.TetradSearch(df)
search.use_conditional_gaussian_test(alpha=0.01)
search.use_degenerate_gaussian_score(penalty_discount=1)
G_ = search.run_grasp_fci()
I believe this is conceptually equivalent to the previous example but it takes 2-3 orders of magnitude longer. If I use a discrete dataset, it runs rather quickly and in a very similar amount of time as using tetrad.search.
I am looking for help in understanding why this difference may arise. Additionally, are there functionalities present when using the TetradSearch class that are not available in tetrad.search, or vice versa?
Thank you.
The text was updated successfully, but these errors were encountered: