States are not nodes #205
Replies: 7 comments
-
I thought about your complain, I guess the solution is to add the number of moves done in the game to the state. Moves done so far is then a variable in the game logic, and when the state is formed to a string that int is appended to the board information. That will avoid loops. |
Beta Was this translation helpful? Give feedback.
-
No, that would not solve the problem. It is perfectly possible to reach the same state via different branches and with the same number of moves. (And I am not talking about cycles (loops). I am unsure, but I think what the current code produces is a directed acyclic graph.) To me, the natural solution is to use explicitly linked objects to represent the nodes in the graph (tree). In an earlier version of Leela, the MCTS code was rather readable. I think that is what they did. I find it very difficult to read their current MCTS code. Thank you for commenting. I am surprised that so much energy is spent discussing a performance improvement (parallellization) while this fundamental fault is not tended to. |
Beta Was this translation helpful? Give feedback.
-
I agree that explicitly linking objects to a tree would be the better solution. How do you like the implementation of MCTS in muzero-general? One may have to abstract away muzero specific aspects, but it is rather clean. I was thinking of porting this back to alpha-zero. |
Beta Was this translation helpful? Give feedback.
-
A game implementer could represent each position by a list of moves done and use all core classes unchanged. I mean just implement the method Game.stringRepresentation() - Line 104 in 5156c7f |
Beta Was this translation helpful? Give feedback.
-
Somewhere was a hint to cache the childs v values in each node and the number of visits, and when the node is visited by another parent node, you return the cached v of that specific number, like def search(self, s, nvisit):
# nvisit: how often the calling node visited the node s
if nvisit < self.Ns[s]:
return self.Ws[s][nvisit] # returning cached v value instead of doing a full subsearch
# ....
# when updates are done for a node
v = search(...)
self.Ns[s] = ...
self.Qs[s] = ...
self.Ws[s].append(v) # cache this v value So actually its smart to avoid the tree structure because like this we avoid multiple calculations (which would give the exact same result) |
Beta Was this translation helpful? Give feedback.
-
I guess that was #47 (comment). #47 in general looks like an earlier mention of this topic. |
Beta Was this translation helpful? Give feedback.
-
Yes, I agree that #47 discusses the same problem. The discussion of that issue correctly notes that som neural network applications (predict) can be avoided if we stick to the state search list method. I think that saves a few percent of the work at the most. I repeat, the way the current code works, some edges count N from two of more paths from the root. From these N values, much too high P values are derived. Despite this problem having been discussed before, it still remains in the code. |
Beta Was this translation helpful? Give feedback.
-
I am puzzled by the programs use of states instead of nodes. In the huge game tree of all possible moves, each node represents a sequence of moves from the start state to the current state.
Because at least some nodes can be reached via more then one sequence of moves states are not the same as nodes in the game tree. Some states correspond to more than one node.
The way this program works, states that represent several nodes accumulate the increments of N from all those nodes and thus are seen as better in comparison with states that represent fewer nodes. That is, better than they should.
Beta Was this translation helpful? Give feedback.
All reactions