AbigailII: *nod* thats how i expected it to work when i first paid attention to the ratings
but when you compare that to a 2 game match in which both players win 1 game .. its a (one) draw .. while in the rating system it should be actually be a slight loss of bkr for the player who won first, and a slight gain in bkr for the player who won the last game
as those matches are calculated as a draw i would expect 2 games with the same player, directly after another, win and loss, have the same result as the 2 game match which was a draw (although the change in bkr might be slightly less because the match is calculated as 1 event and the 2 games in the other case are calculated as 2 events
AbigailII: Actually I have noticed that a win-loss against the same player is a net gain for both players in certain circumstances. This does seem like a flaw in the system.
Hrqls: shouldnt a win + a loss equal out to a draw
No. A win followed by a loss is slightly worse than two draws, while a loss followed by a win is slightly better. It's easiest to see when you have two players with equal rating, and an equal number of games played. Assume their current ratings to be R. Then a win will give an increase of r points, a loss will give a decrease of r points, will a draw doesn't change the ratings. So, after drawing the first game, it's still r points change for a win/loss, and no difference with a draw. With two draws, both players still have rating R. But what if player A wins the first game? Then her rating will be R + r, while player B's rating will be R - r. So, the expected change in rating for player A for the second game will be an increase of p for a win, a decrease of s for a loss, and a decrease of q for a draw, with 0 < q, p < r < s. So, if player A wins, then loses, his rating will be R + r - s < R. And player B, who first loses, then wins, will end up with a rating higher than R.
A win and a loss will be equivalent to two draws only if the rating is calculated after the entire match - but not if you calculate ratings after each result (which is what happens on Brainking).
what i find funny about the formula used right now is that when you win and then lose to the smae player, both you as well as your opponent will have a net gain in BKR
(of course your opponent has to be within 400 points of your bkr, and sometimes it doesnt show as the net gain can be less than 1, but you will see if clearly when you arent established yet)
AbigailII: That's filled a gap. I'm not a chess player and only read as much of that Chess Rating link's info as needed to work out how my provisional ratings were being generated, and to note how much more complicated their scheme is than that in backgammon! Now that you mention matches I've reread that page and seen where match length comes into it. Thanks Abigail :-)
whoo hoo. 30 - 40 would have made it worth an objection
still the same dimension thus and not at all touching
the core of the sense. checked a year ago last time btw ... ~*~
which has no impact in backgammon - some game types aren't that crowded ...
in small pente I am so far ahead, there are only 3-4 players within 400 points
- still I bother to play for a team to be available sometimes after all ... ~*~
playBunny: In chess I believe you play only single games and each game is worth 1 or 1/2 a point whereas in backgammon there are matches worth multiple points.
It's not as simple as that. The chess rating system works just fine over longer matches - in fact, it works even better. If the real rating of two players would be known, one could calculate by which margin a player would win (or lose). For instance, a rating may predict that one player would win 65% of the games. Obviously, this would never be possible in a single game (the score being one of 0%, 50% or 100%), but in a 10 game match, it certainly is possible to get 65%, or at least get closer to it than 50%. In chess, if players play a match of more than one game, or even a complete tournament, ratings aren't adjusted game-by-game, but the result of the entire match or tournament is used. So, if you play a match or tournament, and according to your rating, you should score 58%, but you only scored 45%, your rating will drop.
The chess formula is based on single games where skill is the only factor. A player deemed better than another is expected to win by skill alone and the gains from winning are meagre and the losses from losing are punitive for the better player when the difference between them are large.
In the chess formula, a rating difference of 400 points favours the expert who is expected to win 9/10ths of their games against the average player. In the backgammon formula, the effect of luck is such that experts (500 higher than average) are expected to lose in the region of a third(!) of games against an average player. The losses and gains are much less per match to account for this luck effect.
In chess I believe you play only single games and each game is worth 1 or 1/2 a point whereas in backgammon there are matches worth multiple points. Though the expert backgammon player is expected to win only 2/3s of their single games against an expert, in an 7-point match that goes up to around 80%. So an expert is expected to win a decent length match but the chances of the beginner's lucky win are by no means negligible.
Chess maxes out at 2700 or something, with backgammon 2200 is unusual.
Press the [Newbie] button (it uses 1600, not 1500) and the [500] button and look at the percentages in the first table to see that the expert, P2, should only win 64% of single games but 82.1% of 7-point matches and 90.3% of the 25-pointers.
grenv: Your observation seems to be correct as long as the opponent is within 400 points of your rating. If the difference is greater than 400 points, then you will be penalized heavily for losing while gaining very little if you win. This is why I try to limit my opponents to those within 400 points of my rating.
Mike UK: I've always been concerned by the fact that after a while I tend to just go up or down 8 when I win or lose.
The problem here is that if I am even a little above average and win, say, 55% of my games, I will eventually move my rating up as I play a lot of games.
alanback: I think the underlying theory is similar in both systems, but in practice they behave very differently. First of all the parameters in the USCF formulae are set up for chess and are not suitable for games like gammon where luck plays such an important role. Secondly the provisional formulae are designed to allow a relatively small number of new entrants to quickly reach their correct rating in a large established pool of players. When applied to a startup situation they just introduce a random element. Similarly the formulae for established players only work in a mature rating system. So even for chess, the rating distribtions here are nothing like those of the USCF itself. You only have to look at the number of players here who achieve the rating ceiling of 2700 to see this. At times it seems that ratings are just proportional to number of games played.
As you know in FIBS everyone starts at 1500 and have to work their way up (or down) the ratings over the course of at least 400 games. Because of this, nobody gets a high rating by luck. By the very nature of gammon, it is impossible to try and get to a realistic rating playing less games than this.
The USCF itself uses a different rating system for correspondence chess which I believe is a lot like the FIBS system. I think this would be the obvious one to use at a site like this for chess and the other games. Again probably without the provisional formulae.
Mike UK: Do you understand the two rating systems well enough to explain the differences? I thought they were basically the same, but clearly they are not.
I don't really have a problem with players deciding to rest on their laurels. The problem is that their ratings are unrealistic to begin with. As has been said many times before, the rating system here is not suitable to the gammons. FIBS works well at dailygammon and GT. It is a simple formula so why can't we implement it here. If someone leaves with a high FIBS rating at least you know they earned it.
alanback: The problem is that building this into the formula doesn't have any effect on those that choose not to play at all.
If there was some way to penalize not playing, perhaps some sort of natural decay could be built in to the rating.
Alternatively just remove players who don't either start or finish a game in a set period (I suggest 3 months). If they play again after that they get a provisional rating again.
playBunny: The observation I referred to was statistical -- I can't prove it, but the point was that any history more than 400 experience points old had little effect on your rating -- I think we have all experienced how ratings can swing. If you've been winning recently, your rating is relatively high; if you've been losing, relatively low; it doesn't matter much what it was this time last year. This is different from the 400 points needed to get past the "newbie" factor, of which I am also aware.
should be only an additional value - why punish serious players who play continuously ?
irrigardlessly a penalty rating for inactive players seems reasonable after a while,
let's say 100 down after 3 months not giving a flying stuff at challenges ... ~*~
alanback: FIBS uses a backgammon formula which is used on many sites. Like the chess formula that is used here, it encapsulates the entire playing history. The 400 figure that you're remembering is used when a new player is establishing their rating. For the first 400 experience points the amount gained or lost by a match is multiplied by a number proportional to how many of the 400 points are left. The multiple is 5 at the start and 1 by the time the player has reached 400.
I also don't care to see the same 6/0/0 at the top of the rankings table. Perhaps the cocktail has lost his bottle? I like pgt's suggestion of using a limited history but it might be expensive to administer. A reasonably easy to write method would be that the ratings are recalculated every day for every player (and presumably for every game type). I don't know how much server time that would take but it would certainly be a growing amount as the site gains in popularity. Doing it monthly would be a reasonable compromise; a different set of players could be done on each day of the month.
each player holds a position on a ladder, lower players can challenge higher players, they have to accept (or drop a bit on the ladder) ... the outcome of the game (if accepted) calculates the new ladder position for both players
it would introduce a new system next to the bkr .. and i am not sure if fencer likes this ... but it sounds interesting
in active players would drop because they dont accept the challenges
of course this will be tough for the top player because he will receive a lot of challenges which he might not be able to accept (because a person can only play a limited amount of games during a certain time)
pgt: Certainly worth considering. For those of us with paid subscriptions, it would not be a hardship; but Pawns who play a lot of different game types might find it hard to keep up.
alanback: You need to read to the original post on the subject - the peeve concerned players achieving a high rating and then refusing invitations to continue playing. Hence the "elapsed time" suggestion.
pgt: I've heard it said that on FIBS, your rating is pretty much determined by your most recent 400 experience points anyway. So, why not base ratings on that?
BIG BAD WOLF: Why not change the ranking so that only games completed in the last 6 (or maybe 12) months are included in the rankings - then we would get a better idea of current form, scores gained "learning" games would eventually be eliminated, and the peeve would eventually disappear. (anybody without sufficient completed games in the chosen period could revert to a provisional rsnking)
bumble: Yea, Fencer talked about it - not sure how high it is on his list. Something like removed from the ranking list after 2 months (Pawn) or 3-4 months (Knight & above) - or something like. Of course play one game, and back on there - but at least keeps the player active a little bit.
alanback: I'm sure I read on another board that Fencer is addressing that problem and that something along the lines of what you suggest is to be implemented.
My pet peeve on this site is players who achieve a ridiculously high rating in just a few games (I still don't understand how the rating system allows that to happen) and then just sit there refusing to play more games. I have had a challenge outstanding with the #1 ranked nackgammon player for months with no response. He is not obligated to play me, but I think he should be obligated to play someone and defend his position. A very high rating based on a limited number of games is not an accurate indicator of ability in any case. Some system should be devised to prevent players from sitting forever at the top of the ratings without playing. Perhaps they could be moved back into provisional status if they don't finish a game in a given timespan (such as two months).
Chess:
1 move has about 30 possible ways to play.......
Backgammon:
1 move has about 21·X possible ways to play........
This X is about 8 to 40 and depends on the position. For looking 8-plies/4-moves ahead it would need for X=35(in a simple middlegame position):
(21·35)^8 = Oh my God!
So no minimax or alpha-beta would help.........
Note:
21 is the number of different possible rolls.
X is the number of different possible plays for a single roll of the possible 21.
I don't understand. I thought it must be very simple to count probability using brute force analysis, so why to use neural networks or statistical analysis for backgammon?
题目: Re: More about the neural net backgammon programs ...
playBunny: thanks! that was very useful :)
i will play a lot more against gnubg (i already notice some changes in my play .. i now know the 5 pnt is very important .. and i dont go too far into my home at first (i noticed as well gnubg didnt like that in the analyses)
what is superfluous ? i thought it was always better to have 2 anchors directly next to another instead of just 1 ? it improves the chances a lot in the end (when you are not ahead in pips)
Plies: These are single moves (replies). Gnubg was written by a computer scientist and it starts counting at zero. [rolls eyes + shrug]. So at 0-ply it is considering all the moves that it can make with each of the 21 possible rolls. 1-ply is the player's responses and 2-ply would be GnuBg's replies to those, etc. However a ply isn't quite the same as thinking ahead in chess or as we would think ahead.
You may remember me saying that the neural nets work by amassing huge amounts of statistical data. That data allows it to say with some degree of accuracy that getting to a particular board position gives a certain winning chance. In that sense it's actually "looking ahead" from that position to the end of the game. This evaluation isn't perfect, however, because perfection requires the right values for all the possible board positions - and that's just too much. The reason that neural nets are used is because they are the best mechanism, so far, of making good approximations for data of this nature. Like us they can look at a pattern and say "hmmm, that reminds me of something very similar, I'll use that as a guideline" except that they are geared to look specifically at backgammon patterns, and can do so with great accuracy.
The way plies work is that they take the board further towards the end of the game where these estimations are (generally) more accurate. It's a bit like running to the top of the next hill and the next to see what's out on the horizon.
Ideally the program would always work at 4-ply or better and calculate every possible roll and every possible move at each level. There are 21 possible rolls and anything from zero to umpteen moves (me's no computer, lol) at each ply. This degree of "branching" is much more than in chess and the reason why chess programs can look ahead further; the processing required in backgammon, even at 2-ply, is huge. In order to cut down on the processing and maximise looking ahead, the bg programs utilise a filtering system. The initial 21 rolls are always considered in full. (This is 0-ply). The worst moves are discarded and the remainder examined for responses to the next 21 possibles rolls. (1-ply) The top moves are kept and the next ply examined, and so on. (This, for those who have recently acquired GnuBg is what the filter settings refer to).
At each level, and for each roll and possible move, the board is evaluated. The board evaluation isn't done in a dumb sense of just saying how many pieces are there on each point. This will only be possible when a database can be constructed which holds every position in backgammon (about when "Beam me up, Scotty" is possible). Instead the program does what we do - it consider what elements are present: how many blots and points made in each home table, how many points made in the outer field, is there still a midpoint, how many spares are there on the points, how many builders are there and where, how many runners, what's attacking what, what is the balance is across the board, is there still contact, is there a prime, a broken prime, a closed table, etc, etc, etc; whatever the designers can think of. These elements are what the neural net considers when it's looking for patterns (and partly what makes the differences between the programs). Then, the statistical weightings that it has generated from the thousands of games that it's played against itself say that a given set of positional elements (or, more likely, a set with a given (and high) degree of similarity) has been found to produce a win in such and such a percentage of the games that were played from that position.
Because the winning chances for these positional sets are determined from self-play, you can imagine that situations that turn up again and again will be more accurate. This makes sense as it is the same for us, too. The "degree of similarity" of the set of position elements will improve, approaching an exact match, and the number of games that have been played from that position will be higher too. The quality of the bg programs is still high in the lesser known positions, however, simply because the programs get to discover and play through a lot more of them than we do. A well designed bg program will seek to ensure that the bg "state space" is explored adequately.
Using self-play has an interesting aspect. The evaluation of any position is based on the premise that the opponent is as good as the program. The moves made will thus be on the assumption that the responses will be "perfect" and the program will do its looking ahead amongst the best moves for each side. There is an occasional advantage, then, in playing the unexpected dodgy move because you will be leading the program along a game path that it might not have considered (having filtered out that move and path as being too poor). But making poor moves in order to fool the program may lose more than is gained, simply because they are poor moves. [Ignore this paragraph if it comes across as confusing. ;-) Hey, ignore the whole article! ;-D Lol]
Hopefully you can see that at the furthest extreme it's not even necessary for the programmer to know how to play backgammon! They can simply encode every possible board position and assign it the results of playing every possible game. Hey presto - the perfect robot player. This has in fact been done for hypergammon and for the ending positions in backgammon (the bearing off stage). The robots can play these absolutely perfectly with no consideration required other than looking up an exact board position. The next stage will be to encode every non-contact position (all my pieces have passed all yours, let's race) but that's still too big a number of positions to calculate and store.
The current situation is that a neural network can evaluate game situations by recognising the mix of positional elements. The programmer can easily code for these elements without being too good a player (although, in practice, the advice of top players has been readily utilised). The programmer isn't telling the robot how to play, however; he's telling it what to look for when considering how to recognise game situations. And it's the statistics of how many wins were produced from each game position that was met in the course of self-play that tells it what to play. The bg programs can only teach by saying "here's my list of moves"; they still don't really know much about how to play, lol.
Walter, CM100: First off, sorry for missing that bit about the game properties, and thanks for answering it George. (And yes, I analysed it as a single-point match.)
Much of my learning with GnuBg has been of the form: do a move, get told off about it and then, in the absence of any verbal explanation from the program, rationalise the moves that it says are best. Often this is helped by the fact that there will be part of the move which is common in all the top moves, eg getting a backrunner moving or making a point.
That move 6. The best move given by GnuBg was to hit on the 11 as I said. The second was to bounce off the bar to 16 and start getting home. Your move was deemed 5th out of the 6 that were possible. Guessing to the utmost I think there are several factors. Your move hit on the 2-point in your table which is deep. GnuBg doesn't usually care to go deep. And the hit gave no particular advantage given that there were 5 points open. It even thought that simply moving your blot from 10 into 4 would have been better. More importantly, I think, it wasn't comfortable with you having 5 men in my home table. You had anchors on 4 and 5 so one of these is superfluous. the 5-anchor is better, of course, but hitting me on 16 would have started you on the way home and evened the pipcount by 14 reducing my lead to only 10 pips.
Damn right with "Luck beats skill". It takes a lot of skill to overcome a little bad luck and a little luck to beat a lot of skill. ;-) That's very apparent from having watched so many tournaments at VogClub (they're over in a couple of hours). The top players win more often, of course, but they frequently go out to some of the weakest players.
The Backgammon Rating Formula as used by most of the bg sites calculates that top players (2100-2200; the maximum is lower than in the chess system) playing an average player (1500) will actually lose about a third of the individual games played. That indicates how much luck the formula reckons to be in the game.
I think we'd both agree that you won that particular game due to luck. All those hits and me dancing for half the game while you romped 5 men back from my home table without a care in the world?!! Lolol. I was looking forward to a good jousting match between our respective knights but your knights snuck home in the dark of the night while my King was getting drunk.
As for who's better? Time will tell, my friend. You have the lead so far. ;-)
But does this machine know what it's talking about? Well there are certain games plans/styles/situations where it is less accurate than others - mainly because they are less common. Back games and near back games such as ours may well be in that category; certainly it was true in the past, but the databases improve with every release. I don't know enough about that area to state much.
They do play each other. There's a program vs program tournament held every year but attendance is dropping as it's expensive to enter and the existence of GnuBg as a world-class and free program means that revenue has dropped for the professionals.
Walter Montego: I'd have to disagree, your move 6 should have captured the 11 blot in my opinion. For one thing your chance of being hit next turn is less, and you knocked back his piece further. Easy in hindsight though eh?