Asunto: More about the neural net backgammon programs ...
Modificado por playBunny (17. Junio 2005, 03:13:25)
Plies: These are single moves (replies). Gnubg was written by a computer scientist and it starts counting at zero. [rolls eyes + shrug]. So at 0-ply it is considering all the moves that it can make with each of the 21 possible rolls. 1-ply is the player's responses and 2-ply would be GnuBg's replies to those, etc. However a ply isn't quite the same as thinking ahead in chess or as we would think ahead.
You may remember me saying that the neural nets work by amassing huge amounts of statistical data. That data allows it to say with some degree of accuracy that getting to a particular board position gives a certain winning chance. In that sense it's actually "looking ahead" from that position to the end of the game. This evaluation isn't perfect, however, because perfection requires the right values for all the possible board positions - and that's just too much. The reason that neural nets are used is because they are the best mechanism, so far, of making good approximations for data of this nature. Like us they can look at a pattern and say "hmmm, that reminds me of something very similar, I'll use that as a guideline" except that they are geared to look specifically at backgammon patterns, and can do so with great accuracy.
The way plies work is that they take the board further towards the end of the game where these estimations are (generally) more accurate. It's a bit like running to the top of the next hill and the next to see what's out on the horizon.
Ideally the program would always work at 4-ply or better and calculate every possible roll and every possible move at each level. There are 21 possible rolls and anything from zero to umpteen moves (me's no computer, lol) at each ply. This degree of "branching" is much more than in chess and the reason why chess programs can look ahead further; the processing required in backgammon, even at 2-ply, is huge. In order to cut down on the processing and maximise looking ahead, the bg programs utilise a filtering system. The initial 21 rolls are always considered in full. (This is 0-ply). The worst moves are discarded and the remainder examined for responses to the next 21 possibles rolls. (1-ply) The top moves are kept and the next ply examined, and so on. (This, for those who have recently acquired GnuBg is what the filter settings refer to).
At each level, and for each roll and possible move, the board is evaluated. The board evaluation isn't done in a dumb sense of just saying how many pieces are there on each point. This will only be possible when a database can be constructed which holds every position in backgammon (about when "Beam me up, Scotty" is possible). Instead the program does what we do - it consider what elements are present: how many blots and points made in each home table, how many points made in the outer field, is there still a midpoint, how many spares are there on the points, how many builders are there and where, how many runners, what's attacking what, what is the balance is across the board, is there still contact, is there a prime, a broken prime, a closed table, etc, etc, etc; whatever the designers can think of. These elements are what the neural net considers when it's looking for patterns (and partly what makes the differences between the programs). Then, the statistical weightings that it has generated from the thousands of games that it's played against itself say that a given set of positional elements (or, more likely, a set with a given (and high) degree of similarity) has been found to produce a win in such and such a percentage of the games that were played from that position.
Because the winning chances for these positional sets are determined from self-play, you can imagine that situations that turn up again and again will be more accurate. This makes sense as it is the same for us, too. The "degree of similarity" of the set of position elements will improve, approaching an exact match, and the number of games that have been played from that position will be higher too. The quality of the bg programs is still high in the lesser known positions, however, simply because the programs get to discover and play through a lot more of them than we do. A well designed bg program will seek to ensure that the bg "state space" is explored adequately.
Using self-play has an interesting aspect. The evaluation of any position is based on the premise that the opponent is as good as the program. The moves made will thus be on the assumption that the responses will be "perfect" and the program will do its looking ahead amongst the best moves for each side. There is an occasional advantage, then, in playing the unexpected dodgy move because you will be leading the program along a game path that it might not have considered (having filtered out that move and path as being too poor). But making poor moves in order to fool the program may lose more than is gained, simply because they are poor moves. [Ignore this paragraph if it comes across as confusing. ;-) Hey, ignore the whole article! ;-D Lol]
Hopefully you can see that at the furthest extreme it's not even necessary for the programmer to know how to play backgammon! They can simply encode every possible board position and assign it the results of playing every possible game. Hey presto - the perfect robot player. This has in fact been done for hypergammon and for the ending positions in backgammon (the bearing off stage). The robots can play these absolutely perfectly with no consideration required other than looking up an exact board position. The next stage will be to encode every non-contact position (all my pieces have passed all yours, let's race) but that's still too big a number of positions to calculate and store.
The current situation is that a neural network can evaluate game situations by recognising the mix of positional elements. The programmer can easily code for these elements without being too good a player (although, in practice, the advice of top players has been readily utilised). The programmer isn't telling the robot how to play, however; he's telling it what to look for when considering how to recognise game situations. And it's the statistics of how many wins were produced from each game position that was met in the course of self-play that tells it what to play. The bg programs can only teach by saying "here's my list of moves"; they still don't really know much about how to play, lol.