Error for Extending the chain #11

Yansen0515 · 2023-04-18T12:52:21Z

Hi,
I found my model hasn’t really converged (18.5 K rounds). So I would like to extend the chain to 50K rounds.
And my program was stopped already.
I found some suggestions in the "MultiEnvironmentTrial". I did as following:
First read all the data(phenotype, K...), setup_model_MegaLMM() function, add priors, initialize, etc, and then reload the current_state and resume the chain where I left off by below code:

MegaLMM_state$current_state = readRDS('myrun_ID/current_state.rds')

First I print the MegaLMM_state:

Current iteration: 18500, Posterior_samples: 0
Total time: 1.954428 days

Then I got below error:

[1] "Run 1"

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%Error in record_sample_Posterior_array(current_state[[param]], Posterior[[param]],  : 
  Wrong dimensions of Posterior_array
Calls: sample_MegaLMM ... save_posterior_sample -> record_sample_Posterior_array
Execution halted

Do you have any ideas why I have this problem?

Kind regards,
Yansen

The text was updated successfully, but these errors were encountered:

deruncie · 2023-04-18T16:00:44Z

Hi Yansen, I think you also need to re-attach the posterior database: MegaLMM_state$Posterior = readRDS(file.path(MegaLMM_state$run_ID,'Posterior/Posterior_base.rds')) See if that works. You should see the number of posterior_samples listed when you print the MegaLMM_state at the beginning. Dan From: Yansen CHEN ***@***.***> Date: Tuesday, April 18, 2023 at 5:52 AM To: deruncie/MegaLMM ***@***.***> Cc: Subscribed ***@***.***> Subject: [deruncie/MegaLMM] Error for Extending the chain (Issue #11) Hi, I found my model hasn’t really converged (18.5 K rounds). So I would like to extend the chain to 50K rounds. And my program was stopped already. I found some suggestions in the "MultiEnvironmentTrial". I did as following: First read all the data(phenotype, K...), setup_model_MegaLMM() function, add priors, initialize, etc, and then reload the current_state and resume the chain where I left off by below code: MegaLMM_state$current_state = readRDS('myrun_ID/current_state.rds') First I print the MegaLMM_state: Current iteration: 18500, Posterior_samples: 0 Total time: 1.954428 days Then I got below error: [1] "Run 1" | | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 9% | |======= | 10%Error in record_sample_Posterior_array(current_state[[param]], Posterior[[param]], : Wrong dimensions of Posterior_array Calls: sample_MegaLMM ... save_posterior_sample -> record_sample_Posterior_array Execution halted Do you have any ideas why I have this problem? Kind regards, Yansen — Reply to this email directly, view it on GitHub<#11>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB2IDTIRWTGT3J264NODD6DXB2FBBANCNFSM6AAAAAAXCS5DFE>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Yansen0515 · 2023-04-19T08:05:59Z

Hi,
Thanks, my problem was solved following your code.
By the way, I want to improve the speed of running this package. I know it can be used in OpenMP, however, the speed is still slow for me.
Is possible to run it by MPI (so have more CPUs) or GPU to improve the speed? I am not sure you add this part of the code when you made this package.

Kind rgeards,
Yansen

Yansen0515 · 2023-04-19T13:21:27Z

Hi Dan,
Sorry for the early information. I can run it with the example data from MegaLMM.
But for my data, it didn't work.
I got below results:

print(MegaLMM_state)

 Model dimensions: factors = 500, fixed = 149, regression_R = 0, regression_F = 0, random = 3301 
  Current iteration: 20500, Posterior_samples: 0 
 Total time: 1.810295 days

"Run 1"

|                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%Error in record_sample_Posterior_array(current_state[[param]], Posterior[[param]],  : 
  Wrong dimensions of Posterior_array
Calls: sample_MegaLMM ... save_posterior_sample -> record_sample_Posterior_array
Execution halted

So, I don't know how to do it. And the "Posterior_base.rds" file was removed.
Do you think I need to re-run all of this?

Kind regards,
Yansen

deruncie · 2023-04-19T18:28:04Z

Hi Yansen, Yes, if the posterior_base file is gone, I think you can re-generate it by running: MegaLMM_state = clear_Posterior(MegaLMM_state) Dan From: Yansen CHEN ***@***.***> Date: Wednesday, April 19, 2023 at 6:21 AM To: deruncie/MegaLMM ***@***.***> Cc: Daniel E Runcie ***@***.***>, Comment ***@***.***> Subject: Re: [deruncie/MegaLMM] Error for Extending the chain (Issue #11) Hi Dan, Sorry for the early information. I can run it with the example data from MegaLMM. But for my data, it didn't work. I got below results: print(MegaLMM_state) Model dimensions: factors = 500, fixed = 149, regression_R = 0, regression_F = 0, random = 3301 Current iteration: 20500, Posterior_samples: 0 Total time: 1.810295 days "Run 1" | | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 9% | |======= | 10%Error in record_sample_Posterior_array(current_state[[param]], Posterior[[param]], : Wrong dimensions of Posterior_array Calls: sample_MegaLMM ... save_posterior_sample -> record_sample_Posterior_array Execution halted So, I don't know how to do it. And the "Posterior_base.rds" file was removed. Do you think I need to re-run all of this? Kind regards, Yansen — Reply to this email directly, view it on GitHub<#11 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB2IDTNRLNIPJULKMAA6A3LXB7RGDANCNFSM6AAAAAAXCS5DFE>. You are receiving this because you commented.Message ID: ***@***.***>

deruncie · 2023-04-19T18:30:34Z

Hi Yansen, Sorry that it is not faster. I would like to try these types of ideas, but haven’t had the time to fully explore them. I think the challenge is that there needs to be a lot of data passing each iteration because all calculations are so independent (at least between iterations), so there would be a lot of overhead using MPI (moving data between computers) or GPU (moving data from the cpu’s memory to the gpu), so it’s not clear if they would help. If you have ideas or would like to work on this let me know. Dan From: Yansen CHEN ***@***.***> Date: Wednesday, April 19, 2023 at 1:06 AM To: deruncie/MegaLMM ***@***.***> Cc: Daniel E Runcie ***@***.***>, Comment ***@***.***> Subject: Re: [deruncie/MegaLMM] Error for Extending the chain (Issue #11) Hi, Thanks, my problem was solved following your code. By the way, I want to improve the speed of running this package. I know it can be used in OpenMP, however, the speed is still slow for me. Is possible to run it by MPI (so have more CPUs) or GPU to improve the speed? I am not sure you add this part of the code when you made this package. Kind rgeards, Yansen — Reply to this email directly, view it on GitHub<#11 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB2IDTI6LWAPJ2DDCZOTHIDXB6MHHANCNFSM6AAAAAAXCS5DFE>. You are receiving this because you commented.Message ID: ***@***.***>

Yansen0515 · 2023-04-20T07:27:56Z

Hi Yansen, Yes, if the posterior_base file is gone, I think you can re-generate it by running: MegaLMM_state = clear_Posterior(MegaLMM_state) Dan

Hi Dan,
This code (MegaLMM_state = clear_Posterior(MegaLMM_state) ) only cleaned the "jobID/Posterior" file. So, I did not get the "Posterior_base.rds" again.
PS: I don't know what happened to my real work, however, I can restart (from your suggestions) the chain with example data. So I decided to start a new work again.

Kind regards,
Yansen

Yansen0515 · 2023-04-20T07:36:16Z

Hi Yansen, Sorry that it is not faster. I would like to try these types of ideas, but haven’t had the time to fully explore them. I think the challenge is that there needs to be a lot of data passing each iteration because all calculations are so independent (at least between iterations), so there would be a lot of overhead using MPI (moving data between computers) or GPU (moving data from the cpu’s memory to the gpu), so it’s not clear if they would help. If you have ideas or would like to work on this let me know. Dan From: Yansen CHEN @.> Date: Wednesday, April 19, 2023 at 1:06 AM To: deruncie/MegaLMM @.> Cc: Daniel E Runcie @.>, Comment @.> Subject: Re: [deruncie/MegaLMM] Error for Extending the chain (Issue #11) Hi, Thanks, my problem was solved following your code. By the way, I want to improve the speed of running this package. I know it can be used in OpenMP, however, the speed is still slow for me. Is possible to run it by MPI (so have more CPUs) or GPU to improve the speed? I am not sure you add this part of the code when you made this package. Kind rgeards, Yansen — Reply to this email directly, view it on GitHub<#11 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB2IDTI6LWAPJ2DDCZOTHIDXB6MHHANCNFSM6AAAAAAXCS5DFE. You are receiving this because you commented.Message ID: @.***>

Hi Dan,
I know it is not easy to modify your code to adapt MPI or GPU. I have to finish my work first, then, I will back to improve the speed part.
I am interested to adapt this package to GPU. I think it will be improved the speed. In our cluster, the compute nodes are interconnected with a 10 Gigabit Ethernet network.

Kind regards,
Yansen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error for Extending the chain #11

Error for Extending the chain #11

Yansen0515 commented Apr 18, 2023

deruncie commented Apr 18, 2023 via email

Yansen0515 commented Apr 19, 2023

Yansen0515 commented Apr 19, 2023

deruncie commented Apr 19, 2023 via email

deruncie commented Apr 19, 2023 via email

Yansen0515 commented Apr 20, 2023

Yansen0515 commented Apr 20, 2023

Error for Extending the chain #11

Error for Extending the chain #11

Comments

Yansen0515 commented Apr 18, 2023

deruncie commented Apr 18, 2023 via email

Yansen0515 commented Apr 19, 2023

Yansen0515 commented Apr 19, 2023

deruncie commented Apr 19, 2023 via email

deruncie commented Apr 19, 2023 via email

Yansen0515 commented Apr 20, 2023

Yansen0515 commented Apr 20, 2023