Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error for Extending the chain #11

Open
Yansen0515 opened this issue Apr 18, 2023 · 7 comments
Open

Error for Extending the chain #11

Yansen0515 opened this issue Apr 18, 2023 · 7 comments

Comments

@Yansen0515
Copy link

Hi,
I found my model hasn’t really converged (18.5 K rounds). So I would like to extend the chain to 50K rounds.
And my program was stopped already.
I found some suggestions in the "MultiEnvironmentTrial". I did as following:
First read all the data(phenotype, K...), setup_model_MegaLMM() function, add priors, initialize, etc, and then reload the current_state and resume the chain where I left off by below code:

MegaLMM_state$current_state = readRDS('myrun_ID/current_state.rds')

First I print the MegaLMM_state:

Current iteration: 18500, Posterior_samples: 0
Total time: 1.954428 days

Then I got below error:

[1] "Run 1"

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%Error in record_sample_Posterior_array(current_state[[param]], Posterior[[param]],  : 
  Wrong dimensions of Posterior_array
Calls: sample_MegaLMM ... save_posterior_sample -> record_sample_Posterior_array
Execution halted

Do you have any ideas why I have this problem?

Kind regards,
Yansen

@deruncie
Copy link
Owner

deruncie commented Apr 18, 2023 via email

@Yansen0515
Copy link
Author

Hi,
Thanks, my problem was solved following your code.
By the way, I want to improve the speed of running this package. I know it can be used in OpenMP, however, the speed is still slow for me.
Is possible to run it by MPI (so have more CPUs) or GPU to improve the speed? I am not sure you add this part of the code when you made this package.

Kind rgeards,
Yansen

@Yansen0515
Copy link
Author

Hi Dan,
Sorry for the early information. I can run it with the example data from MegaLMM.
But for my data, it didn't work.
I got below results:

print(MegaLMM_state)

 Model dimensions: factors = 500, fixed = 149, regression_R = 0, regression_F = 0, random = 3301 
  Current iteration: 20500, Posterior_samples: 0 
 Total time: 1.810295 days 

"Run 1"

|                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%Error in record_sample_Posterior_array(current_state[[param]], Posterior[[param]],  : 
  Wrong dimensions of Posterior_array
Calls: sample_MegaLMM ... save_posterior_sample -> record_sample_Posterior_array
Execution halted

So, I don't know how to do it. And the "Posterior_base.rds" file was removed.
Do you think I need to re-run all of this?

Kind regards,
Yansen

@deruncie
Copy link
Owner

deruncie commented Apr 19, 2023 via email

@deruncie
Copy link
Owner

deruncie commented Apr 19, 2023 via email

@Yansen0515
Copy link
Author

Hi Yansen, Yes, if the posterior_base file is gone, I think you can re-generate it by running: MegaLMM_state = clear_Posterior(MegaLMM_state) Dan

Hi Dan,
This code (MegaLMM_state = clear_Posterior(MegaLMM_state) ) only cleaned the "jobID/Posterior" file. So, I did not get the "Posterior_base.rds" again.
PS: I don't know what happened to my real work, however, I can restart (from your suggestions) the chain with example data. So I decided to start a new work again.

Kind regards,
Yansen

@Yansen0515
Copy link
Author

Hi Yansen, Sorry that it is not faster. I would like to try these types of ideas, but haven’t had the time to fully explore them. I think the challenge is that there needs to be a lot of data passing each iteration because all calculations are so independent (at least between iterations), so there would be a lot of overhead using MPI (moving data between computers) or GPU (moving data from the cpu’s memory to the gpu), so it’s not clear if they would help. If you have ideas or would like to work on this let me know. Dan From: Yansen CHEN @.> Date: Wednesday, April 19, 2023 at 1:06 AM To: deruncie/MegaLMM @.> Cc: Daniel E Runcie @.>, Comment @.> Subject: Re: [deruncie/MegaLMM] Error for Extending the chain (Issue #11) Hi, Thanks, my problem was solved following your code. By the way, I want to improve the speed of running this package. I know it can be used in OpenMP, however, the speed is still slow for me. Is possible to run it by MPI (so have more CPUs) or GPU to improve the speed? I am not sure you add this part of the code when you made this package. Kind rgeards, Yansen — Reply to this email directly, view it on GitHub<#11 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB2IDTI6LWAPJ2DDCZOTHIDXB6MHHANCNFSM6AAAAAAXCS5DFE. You are receiving this because you commented.Message ID: @.***>

Hi Dan,
I know it is not easy to modify your code to adapt MPI or GPU. I have to finish my work first, then, I will back to improve the speed part.
I am interested to adapt this package to GPU. I think it will be improved the speed. In our cluster, the compute nodes are interconnected with a 10 Gigabit Ethernet network.

Kind regards,
Yansen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants