Sunday, February 8, 2015

On Anomalies

I was having a think about when the prediction made by Debatebreaker might be slightly inaccurate. I came up with the following list of reasons, which I will add to should I think of more.

N.B. These aren't problems with Debatebreaker. I'm highlighting these issues here because they are things that typically happen at tournaments that can render Debatebreaker slightly inaccurate. Debatebreaker works on mathematical certainty, so it can't account for these anomalies.. The day Artificial Intelligence becomes mainstream though.....

a) When swing teams get messy.

I've noticed, when looking at the tabs of some tournaments, that swing teams are sometimes treated as follows: Their scores are zeroed after every round they debate regardless of whether they win or lose, thus placing them back at the bottom of the tab. This is NOT due to a lack of tabbing expertise, I'll explain why.

Optimally, this shouldn't happen. Should there not be a perfect number of teams in the tournament (multiple of 4 if BP, multiple of 2 otherwise), you'd ideally want to have some debaters on standby who can then form a swing team and debate the entire tournament. That way, you wouldn't have to keep zeroing scores.

The problem is that tournaments, especially large ones like WUDC, are rarely so kind. I was part of the tab team at WUDC 2015, and issues inevitably crept up in every round that we'd have to deal with. For instance, a team would pull out from a round due to illness, but want to debate the subsequent rounds. Teams just didn't show up for some rounds. All kinds. This meant that swing teams had to be slotted in, and then taken out and have their score for the round zeroed when the team they were debating in place of returned.

The result of this is that the sum total of points in the tournament is now less than it should be.
For instance, if there are 30 rooms in a BP tournament that has 5 preliminary rounds:

In every room there are:  0 + 1 + 2 + 3 = 6 points to be won
In every round there are: 30 x 6 = 180 points to be won
In the prelims there are: 5 x 180 = 900 points to be won

Zeroing scores reduces the total number of points to be won in the tournament, thus skewing the statistical distribution of the point brackets. This doesn't render Debatebreaker's prediction wrong, but it does mean that it MAY be off by 1 or 2 teams.


b) When swing teams break / when there are institutional caps for breaking

This one is fairly intuitive, I think. If a swing team debates and is in a position to break but aren't allowed to, the best case scenario is going to be better than what Debatebreaker predicts because teams have just got artificially bumped up the pecking order.

Likewise for when there are institutional caps on the breaking teams e,g. a maximum of 3 teams from an institution may break.


c) Tab errors

I hate to have to say this, I really do. Speaking as a tabber, I have the highest empathy for the stress that comes along with tabbing. However, this is necessary to note here.

Tab errors will mess the prediction up. Prior to releasing it, I tested Debatebreaker on the tabs of nearly a hundred tournaments. It was only wrong in a single case. I was lucky enough to personally know the tabmaster for the tournament, so I contacted the person and asked if anything went wrong during the tournament. Much to that person's credit, they were completely transparent and honest in telling me that there was a tab error during one of the preliminary rounds.

What happened was that the scores for a round were keyed in wrongly, and the mistake wasn't spotted until 2 rounds later. This meant that the repercussions of the mistake had led to sub-optimal draws for 2 rounds, which naturally would mess up the bracketing and thus cause Debatebreaker's prediction to fall short.

To the tabmaster who so graciously responded to my queries, thank you. It's rare that people are so honest about their mistakes. Without your honesty, I might have continued fruitlessly trying to find an error in my methodology. I am very grateful that you did what you did. =)


d) No power matching used

This is also fairly intuitive. In order for Debatebreaker to work, the process of drawing matchups must have a foundation in power matching. Otherwise, at a tournament like WSDC for instance, you could hypothetically have up to 6 or 7 teams breaking on perfect wins without ever facing each other. This is not an argument against the WSDC format. It merely seeks to highlight why Debatebreaker cannot work at tournaments that don't employ power matching to generate matchups.


No comments:

Post a Comment