tonyh: The problem is, and it has been talked about a lot in the past, is all games on BrainKing currently use the same rating system, which is one designed for Chess games which is (for sake of argument) is 95% skill, 5% luck
So when you apply that same rating system to games like Backgammon (65% skill, 35% luck), Battleboats (25% skill, 75% luck), Ludo (30% skill, 70% luck), Dice Poker (35% skill, 65% luck), etc.... anyway, using the same rating system designed for a mostly skill game does not produce the same results for luck games.
PLEASE NOTE: I'm not trying to start an argument about how much luck/skill goes into each game - the % that I wrote above are just quick numbers I made up.
What the site really needs is at least 2 different rating systems - 1 for mostly skill games (chess, checkers), and 1 for games that deal more with luck (dice games).
Hopefully some day Fencer will add that (and go back and recalculate ratings from the start).
Yes, the system is rubbish. I just checked out a waiting game, and I stood to lose 83 if I lost the game, and get 1 point if I won. Why would I even bother? What encouragement is there for good players to play against novices. And winning (or losing) by a gammon or backgammon should involve a premium. It's ridiculous that the same points are used when on wins by a backgammon, and when there is just one point difference at the end
I am playing someone who is rated 180 points below me. If he wins, he gets 12 points; if I win, I get 4 points. So over 10 games, I would need to win 8 games to get 32 points and make a net profit of 8 points. That is just not going to happen against a reasonably skilled player; there is just too much luck in backgammon games. It means that I am virtually forced to play opponents at my current rating and so my choice of opponents is very resricted. A much more sensible approach would be to have 7/9 range for a difference of 150 rating points and 6/10 difference for all the rest.
wetware: I've abandoned my efforts to gather any more data relating to 2nd roll behavior. Have submitted a Bug Tracker entry containing a summary of my 2008-2009 games, despite BK's a priori claim that problem reports involving dice are groundless.
Thanks to those of you who helped with the statistical analysis or provided other insights.
rod03801: I have no issue about if another mod would have edited it or not. I am just stating that I think it was a silly moderation IMHO. Each person has their own opinion right. Obviously yours differs from mine. Thats ok, you are a different person than I am.
playBunny: well, considering my probably 6-0 lead in a 9 point match I think I have a 50-50 chance given your skill level compared to mine is similar to comparing a Boeing 747 compared to whale .
My comment was completely due to the "wick" at the end of your link. That was a little uncalled for don't you think? Especially considering you "knew" it wasn't bogus !!!
paully: I find it interesting you would post such a post to make people think I did some weird stuff to climb.
Lol. I find it interesting that you think I posted it for that reason. ;-)
It's an impressive climb, almost 350 points in two weeks, and a sight to behold - for those who can see it! Lol. ;-P
If I thought it was due to cheating then I'd have said so. If others want to think that, given just a graph and no comment from me, well, it's their choice to jump to uninformed conclusions. Intelligent observers look for more evidence than a line on a graph, they look for the reason for the line on the graph.
In contrast to me not saying anything about your climb being bogus (because it isn't), you have clearly suggested malice and spite as my motivation. Don't you think that such an accusation is rather incompatible with publically suggesting a phone call? LOL A good time to call? A good time is when you're not being such a flaming galah! ;O)
ps. I'll allow that using a wink smiley with the graph was slightly suggestive. ;-)
pps. Being a much under-rated player in the context of the BrainKiing Chessgammon rating formula is bad enough but, with your paranoid projection as well, I think there's good reason to boot you from the tourney!
I won't, of course, 'cos I'm a nice bunny and I like you. But only on one condition.... You have to beat me in our match at Pocket-Monkey! Do you reckon you can do it? ;o)
playBunny: that link doesnt work for me being a pawn I dont have access to my graphs and you still own all my won membership.
I find it interesting you would post such a post to make people think I did some weird stuff to climb. I think you know me better than that or is it out of spite coz we havent spoken on the phone for a number of weeks LOL
Speaking of which, we should do another phone hookup soon. Lemme know when a good time is.
Og an FWIW, my rating is still far below what it should be
No signs of anything unusual in the individual dice values from the opening rolls of my 702 games from 2008. The distribution looks normal to me.
INDIVIDUAL DICE: 1 on die = 232 occurrences 2 on die = 245 occurrences 3 on die = 234 occurrences 4 on die = 236 occurrences 5 on die = 231 occurrences 6 on die = 226 occurrences
I'm not as confident about the paired values observed in these 702 games. But I'm not alarmed by these, either.
I think most of us didn't suspect that there was anything odd about opening rolls, considered in isolation. But whether we were skeptical or not on that point...we really had no data. But now we do. Sometimes it's worthwhile to gather data that helps us eliminate areas of concern; it can help us identify exactly what is broken here.
Next on my to-do list: analysis of my next to last pairs of rolls from my games from 2009. Between the first pairs of rolls, we've already seen greatly excessive reappearance of at least 1 die from the opening rolls. What I'd like to learn: whether there is any similar excess evident between rolls that occur later in games, or whether the focus needs to be placed upon the second rolls of games--and the routines that generate them. I have no preconception here. I'm going where the data leads.
alanback: I described simulation in my earlier post. Backgammon (playBunny, 2010-07-18 14:28:23) If you are simulating the real dice action at the start of the game then each player occasionally will get the same dice, as they do in real life, and the roll will have to be done over. If you play GnuBg then you'll see that it does this, for example "A new session has been started --- GnuBg rolls 4, playBunny rolls 4 --- GnuBg rolls 2, playBunny rolls 4". It's not strictly necessary to go through those motions, as a binary coin toss will suffice, but that's how the GnuBg programmers did it and maybe Fencer liked the idea too.
Resher: Why would anyone want to re-roll if the first rolls were the same? And what do you mean by "simulated"? I personally never had a vision of Fencer rolling an actual pair of dice every time I click ...
wetware: Your expected numbers of the different types of responder roll look right to me. Using them, I get a chi-squared statistic of 274, when I'd expect a figure of 13.8 or above for only 1 in a 1000 samples (years in this case) if the dice rolls were totally independent and generated fairly. So we're talking odds of many, many millions to one against this being the case.
I think by now that most of us are agreed on this being caused by non-independence of the opening rolls rather than non-fair "dice" being used. But data and stats are fascinating, so feel free to produce more!
Hopefully some of your analysis will give someone some insight as to when at least part of the opener's roll is used as part of the responder's roll too. My guess would be that the actual rolling is simulated and so re-rolls will be generated if both players are assigned the same first roll, and it's this re-rolling that isn't working properly. I think pB's already suggested this. Hard to test though ....
[Raw data for what follows is available on request. Just send me a message with your email address.]
Some figures based upon every one of my year 2008 BrainKing backgammon games (n=702) in which at least 2 rolls are saved in the system:
Out of 702 games played, the average expectation for the number of games in which opener's and responder's rolls will be identical = 39 games. Observed number of games in which identical rolls were seen=123 games.
Roughly 3 times the expected frequency!
Out of 702 games played, the average expectation for the number of games in which responder's dice will both be different from opener's dice = 312 games. Observed number of games in which this occurred=156 games. (Somebody should double-check this. It's very close if not correct, but I'm tired.)
wetware: I think humans tend to notice/remember items that appear near the beginnings or ends of lists or sequences. It's an effect seen in some memory tasks.
Yes, respectively, the latency and recency effects.
playBunny: your speculation about the exceptions possibly being caused by the "swap dice" function is intriguing! My data (selected by you and shown below) showed no exceptions. I rarely click "swap dice" on the opening rolls...only when required, to get past an opponent's point.
playBunny wrote: "If one of the dice is always the same and the other is a fair roll then duplication of both would occur with a frequency of 1/6 rather than the 1/18 that's expected, so 3 times more often."
Some error like that could explain all the other figures seen so far: the excessive exact matches, the excessive near misses, etc.
And IF in 1/2 (exactly or approximately) of cases, 1 of the opener's dice is being "re-used" to generate responder's roll...we would overall also expect to see (exactly or approximately) 1/2 the number of expected cases where BOTH of responders dice differ from opener's roll.
And that is in fact what the data showed for my 2009 games: Average expectation of responder's dice both differing from opener's dice out of 137 played = 60.888_ games. Observed number=28 games
playBunny: I've mentioned the excessive frequency of "near-misses" below (when discussing my 2009 data). But that could result, as you suggest, solely from the excessive re-appearance of just 1 of the dice in the responder's roll. In my 2008 data, the frequency of responder's dice not matching either of the opener's dice was only about 1/2 of expectation.
No doubt that the re-appearance of one of the dice is excessive. Later today, I'll have a better idea just how excessive it is. And I will take a look to see whether the "other" die in such cases appears to be completely independent, or also shows signs of unusual influence.
2 other notes regarding the exclusive focus upon the the first 2 rolls of the game:
Psychological: I think humans tend to notice/remember items that appear near the beginnings or ends of lists or sequences. It's an effect seen in some memory tasks. That might have been a factor here. I think that repeated rolls would more easily get our attention when they occur from the commonly-seen, symmetrical, initial position. Typically, we don't have much complicated stuff to think about during the opening rolls--maybe trying to remember what's best in a GG situation--so we can afford to think about other stuff...such as the frequencies and patterns of rolled dice.
Practical: As an investigator, I can be more confident that games will contain at least 2 rolls. That doesn't always happen, due to timeouts, etc. But it makes data capture much easier.
playBunny: I agree with Thad that the number generator itself is probably okay and that it's the use that's at fault.
I think that the problem is only on the opening rolls. I know that the checking of pairs of dice within games hasn't been done yet but I suspect that a high occurence of duplication, such as there is for the opening rolls, would have been noticed much sooner than now and by many more people.
I know that I'd occasionally notice that the opponent's opening dice came out the same as mine, or vice versa, but I never went beyond that, to seeing it as a pattern. If it were happening throughout the game then I'm sure that I would have noticed and other, more observent people, would have seen it sooner.
So, assuming that it is an opening rolls issue, we must be looking for code that is special to the start of a game. One obvious contender is the rolling of dice for who goes first. In real backgammon, each player rolls a dice and the one with the higher value gets both to play with. After that first move the two players pick up their individual dice and thereafter take care of their own rolls.
If I were coding a backgammon server then I wouldn't bother with that. I'd simply toss a binary digit to see who was to start and then roll the starting player's dice using the same code as every other roll. However, if I were to code a simulation of the real live start action then there'd be the opportunity for error.
What might happen then is that I use one dice from each player for the first player's roll but then re-use one of those dice for the second player, presumably the dice that they rolled to see who started.
This, if Fencer is doing such a simulation, is the prime suspect for the bug. If you look at the example matches below then you can see clearly that there's at least one common dice in the majority of games. It's somthing that should be frequent (55% - the same odds as getting one man off the bar into a home table with 2 points open) but not that frequent.
If one of the dice is always the same and the other is a fair roll then duplication of both would occur with a frequency of 1/6 rather than the 1/18 that's expected, so 3 times more often.
The interesting thing about this bug is that there are exceptions. Although there are none in wetware's matches below, there are a few in mine and more in Resher's. That must be caused by something. One theory is that perhaps when the starting player swaps the dice before moving this somehow breaks the connection between the forst and second players' dice. I can't think why that should be the case and haven't played any matches that I can test the theory with.
wetware: Are you, by any chance, recording how often the opponent gets one of the starter's dice? In the example games that I did back in November, in one of the matches that occurred with every single game
And here are the first 5 matches on your finished games page. Red denotes one or two common dice, bold black shows where there are no common dice.
wetware: analysis of my year 2008 games is still in progress. 313 of ~700 games complete. (My earlier guesstimate of ~1000 games was off the mark.)
So far I see no sign of anything unusual in the individual dice values on the opening roll. (I'll wait for the values from all ~700 games before I look for any pairwise strangeness.)
Here were the individual dice frequencies from the first 313 games:
1 on die = 111 occurrences 2 on die = 102 occurrences 3 on die = 104 occurrences 4 on die = 100 occurrences 5 on die = 116 occurrences 6 on die = 93 occurrences
Early results from 2008 show an excess (approximately 4 times the mean expected frequency) of responder rolls exactly matching opening rolls. I'm still aiming to finish and report tomorrow. Will save my raw observed values as a text file, for anyone who'd like them. File will also include--for each game--the URL of the page showing the opening and responding dice rolled. You'll be able to check my transcription error rate :-)
playBunny: Look on the bright side, pB: we've been given a new, unannounced game/puzzle/variant!
It's quite a bit like backgammon. It's similar enough, in fact, that those who prefer to make their moves and cube decisions just as their bots dictate (these players know who they are) can continue to do so with success. Others can continue playing, without ever realizing that it's a subtle variant--but is not standard backgammon. Still others eventually wake up to the fact that the normal rules no longer fully apply, and that adjustments will be necessary to maximize their equity during play.
But it's a significant challenge to determine exactly which adjustments are needed. So it also has an added element of mystery: requiring clues, evidence, and deduction.
I rather like the social element of this new game: players working together as a team to find the solution.
nabla: But that would be too easy...cheating me of all the fun of capturing these values by hand! :-)
As I mentioned to you in a message, let me now state on the board:
"...based in part on Alan's recent comments, I'm tempted to also capture the actual rolls for aggregate analysis. Based on observed rolls from 1000+ of my games [all of year 2008], we should then have a fairly solid idea whether or not there's a problem with the basic frequency of the dice rolled. It would be good to know whether we should suspect that a fault lies there, or whether we should spend time looking elsewhere."
grenv: Even easier and more reliable, since all games all recorded : querying the database for all pair of first roll and second roll in every game since x months. I'd expect the results to be overwhelmingly abnormal.
My guess would be that the random number generating function is fine. After all, think how hard it would be try and write a random number function but actually write something that produces the results we are seeing. The bad code would be obvious. Instead, I suspect that the code is not being called properly. Consider this outline for the code:
Whose turn? - Player 1 What do we need to do (accept double, accept draw, roll dice, etc.) Roll dice Show dice Player 1 moves Player 2's turn What do we need to do (accept double, accept draw, roll dice, etc.) Show dice Player 2 moves
It would be pretty easy to bury a bug that could give us the results we are seeing with something like this.
Surely the easiest way to analyze this is by getting the source code and running a large number of tests... Fencer? I suggest making at least the random number generating code available so others can analyze properly. Either it isn't a defect and can be proved, or it is and can be fixed.
Of course, there's no reason to believe this phenomenon is limited to opening rolls. In general, it seems one should bear in mind the enhanced probability of duplicate or similar rolls in planning strategy.
Has anyone run a test on the distribution of single die rolls? One way that these observed deviations from the norm could arise would be if, say, the chance of rolling a 4 on a single die was significantly higher than it should be. Depending on the pseudo RNG that is used, this might be a simpler explanation than any theory involving pairs of dice.
I've performed a Chi-squared test on the hypothesis that the probabilities of responder's first roll having 0, 1 and 2 dice the same as opener's roll are as they should be, that is 16/36, 18/36 and 2/36 respectively.
This is a test with 2 degrees of freedom, so the chi-squared statistic has: a 5% chance of exceeding 6.0 if the probabilities are correct, a 1% chance of exceeding 9.2 a 0.5% chance of exceeding 10.6, and a 0.1% chance of exceeding 13.8
alanback's result (from 55 games) is 9.2. A result this high or greater would happen only 1% of the time, so this is enough to cause suspicion that the dice aren't following the desired probabilities. But it's not proof. Also, this test is reckoned to give very accurate results only if the expected outcomes are all greater than 5. So, with our smallest probability being 2/36, this means we need at least 90 games in our sample for me to be happy beyond reasonable doubt about the conclusion.
So, moving on to my results (100 games), I get a statistic of 102.
And lastly, wetware's results (137 games) give a statistic of 139.
Remember, if the dice rolls are working properly, there's only a 1 in a 1000 chance of this chi-squared statistic being 13.8 or higher in any individual test, so the conclusion can't really be in doubt - something is wrong somewhere.
playBunny: I think we all know the answer to that.
Just out of curiosity I looked at the 55 games in matches I have completed in 2010. There are 8 in which the first two rolls were the same (same two dice, order not considered), versus the predicted 3 and change. Both dice different, predicted 24, actual 19.
The crucial question, I believe, is which standard deviation will be the greater - how far from chance the return opening rolls are or how far Fencer is from caring!