Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Question about log parameters #267

Open
Natmat626 opened this issue Mar 2, 2023 · 2 comments
Open

[Question] Question about log parameters #267

Natmat626 opened this issue Mar 2, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@Natmat626
Copy link

I really enjoyed working with this repo.Thanks for making d3rlpy, which helped me a lot to get started in offline RL.
I just got into offline RL. There are many parameter indicators recorded in the log, such as "actor_loss", “alpha_loss”..., which are different from many parameters in online RL. I would like to ask if you plan to add descriptions of these parameters in the document? Or how can I get a detailed description of these parameters that a novice like me can understand.

@Natmat626 Natmat626 added the enhancement New feature or request label Mar 2, 2023
@takuseno
Copy link
Owner

takuseno commented Mar 4, 2023

@Natmat626 Thanks for the issue. Actually, there is no difference between online and offline in terms of types of logs. Currently, the descriptions of logs, which are written for beginners, are not documented anywhere. Sorry for the inconvenience.

@Natmat626
Copy link
Author

@takuseno Thanks for the answer, I'm sad that there is no relevant documentation to take care of newbies. But now I have a simple question to ask you, I hope you can answer it. In "Online_RL", the Loss curve is generally not regarded as a judging standard. People will observe "Episode_mean_reward" and "Episode_mean_step" more, because this can more correctly evaluate the performance of the current model.
But if I now collect a set of expert data from a complex game environment, it means that it is a set of data with only positive rewards, and because of the complexity of the game envionment , then the "scorers" evaluation tool in the "fit" function is invalid for such cases.
So in the case of training with pure expert data, how should the Loss value curve be understood? I understand that it should be different from the Loss value in "Online_RL". It may be more like the Loss value in deep learning. The smaller the value, the better the data fit. I think my idea is probably wrong, because I am just a game developer, and my understanding of machine education is too limited. I hope you can help me answer this question. thanks very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants