Creating Fair Horseshoe Tournaments using Programming and Reseeding

STEM 1 is taught by Dr. Crowthers. In this class, we took on our own project, which we started brainstorming for during the summer of 2020. This class taught us about time management, setting your own SMART goals and doing them, and researching well enough to become experts in our project field. It may be very stressful at times, but the satisfaction of knowing I accomplished something big by pushing through stays with me.

In the game of horseshoes, teams of two are randomly created to compete in double elimination competitions which leads to the creation of unfair teams who have more skill than the competition. This often leads to unfair tournaments. Although conventional single elimination seeding methods exist, they do not function for double elimination. Therefore, the goal of this project is to build a program that decides fair teams for horseshoe tournaments based on any given group of people based on skill level, which is based on several criteria, such as the number of points a person scores in a game, their ringer percentage, and how many of their opponents’ points they canceled. Historical data from a sanctioned Horseshoe League, the Central Connecticut Horseshoe Club (CCHC), were gathered and used in a Java program that simulated common types of horseshoe tournaments. To find the teams for these tournaments, the players were ranked based on their skill level. The win rate of each seed was recorded, and the tournament was determined to be fair by analyzing the Win Rate vs. Seed graph. The data gathered more closely followed the ideal line of best fit which gives lower seeds that have higher skill levels a higher chance of winning. Since the results show that this program is closer to the ideal, this seeding method is fairer than random draw for double elimination. This program can be used, with adapted parameters, for other backyard sports that are based on skill, such as pool or cornhole.

In horseshoes, teams of two are created randomly, so that unfair teams created by two people who are better than most of the competition are allowed.

The goal of this project is to build a program that takes several data points into account to determine fair teams for double elimination tournaments.

Tournaments are a way of matching up teams against each other, to figure out which team is the most skilled in a particular activity. The main goal of a tournament is to reward higher skill with a higher chance of victory, which is defined to be fair. Some sports tournaments achieve this goal through seeding the tournament, which involves ranking the players in terms of skill, with a lower seed number having more skill, and deciding matchups based on that (Baumann, Matheson and Howe, 2010). For example, the standard seeding method consists of matching teams up so that the sums of the team numbers are constant (Schwenk, 2000). Another method ensures fairness by re-assigning seed numbers (which is called reseeding) multiple times throughout the tournament to reevaluate matchups (Baumann et al., 2010). The tournaments that this paper examined is horseshoes tournaments, in particular double elimination tournaments. While some more official horseshoe tournaments use ringer percentage - a form of skill level - to seed, most horseshoe tournaments lack seeding or reseeding (NHPA, 2019). Horseshoe tournaments are set up by deciding teams of two, and these teams will compete until a goal, which varies between types of tournaments, is completed. For example, double elimination requires teams to lose twice to be eliminated, and the goal is to be last team that is not eliminated. The organizer of the tournament decides these teams by a random draw, which does not always lead to the best results. For example, two people that have the most amount of skill can become partners. While the tournament does favor them in terms of skill, it may not favor the 3rd best player who is also very skilled but became partners with someone of low skill level. These types of matchups would not be considered fair, as the tournament only favors those top two, and not any other skill level. The main goal of this project is to develop and automate a seeding method for horseshoes tournaments and decide whether this method is fairer than current random draw methods of determining teams.

To determine if seeding would benefit double elimination horseshoe tournaments, a Java program was written using Eclipse IDE to simulate tournaments.
The tournaments were simulated mainly because this project was being completed during a global pandemic, so in-person testing was not as readily available.
Another reason is that horseshoes tournaments take a long time to complete in-person (an average of 2-4 hours per tournament).
By simulating the tournaments, data could be gathered much more quickly and easily.

To simulate the tournaments, data were gathered from an official sanctioned horseshoe league, the Central Connecticut Horseshoe Club (CCHC).
The league regularly holds competitions and gathers public access data about said competitions.
This data includes ringer percent data, number of points scored, average points per game, etc. (Sluys, n.d.).
The data were gathered and put into a CSV file for the Java program to read and simulate tournaments with.
To simulate games, the winner was semi-randomly selected, where similarly to real life, a higher seed would have a higher chance of winning than a lower seed in each game.

The seeding method of choice to run double elimination tournaments was equal gap seeding.
Although this was not the fairest type of single elimination seeding method available, it was the one that could be translated into double elimination the most effectively.
The reason for this is that double elimination tournament was treated as two single elimination tournaments, a winner’s bracket and a loser’s bracket.
Although the winner’s bracket was modeled as a basic single elimination tournament, the loser’s bracket was not as simple.
The loser’s bracket was a single elimination tournament with the players who lost in the winner’s bracket being added, which the automation for equal gap seeding could handle (Karpov, 2016).
This seeding method also functions for team numbers that are not powers of 2, using Hwang’s (1982) method of creating dummy teams that lose to all real teams they face to artificially create a team number that is a power of 2 (Hwang, 1982).
To ensure that matches are fair later in the tournament, the tournament is reseeded like the method of Arnone, Cire, and Meyerhofer (2017).
After each round, a boost is added to each team’s seeding score, which helps the tournament favor their skill by changing their seed number (Arnone, Cire, and Meyerhofer, 2017).

Once the tournaments were coded in with the equal gap seeding method, 3 sets of 100 tournaments were run.
In each set, there was a different number of teams in the tournament to begin, namely 13 teams, 7 teams, and 4 teams.
These numbers were randomly chosen between intervals of powers of two to prevent loss of generality.
The number of wins for each numbered seed in these sets of 100 tournaments were recorded.
The process was repeated, except the seeding method was disabled and random teams were generated to compare random draw to seeding methods.
The skill levels of each team were tracked, and were used to record win rate data, although they did not affect the structure of the tournament.
The win data was then converted to percentages, accounting for any accidental added tournaments in each set.

Once the win rate data were gathered, Spearman’s Rank Correlation Test was used on both samples to test the correlation between the seed number and the win rate percent.

Win Rate vs. Seed Graph, for the 4-Team Double Elimination set.

Win Rate vs. Seed Graph, for the 7-Team Double Elimination set.

Win Rate vs. Seed Graph, for the 13-Team Double Elimination set.

Results from conducting Spearman's Rank-Order Correlation Test on all 3 sets of tournaments.

Spearman’s Rank-Order Correlation Test was used to find the strength and the presence of correlation between seed number and win rate. Since seed number decreases as skill increases, the goal was to show the strength of the negative correlation between seed number and win rate was greater for the seeded tournaments than the random-draw tournaments. The ρ values generated by this test, shown in Table 1 above on the right, represent the strength of this correlation, as a number between -1 and 0 in this case. As ρ becomes closer to -1, the strength of the correlation increases (Spearman’s Rank-Order Correlation - A Guide to How to Calculate It and Interpret the Output., n.d.). In each set of tournaments, the Degrees of Freedom were two less than the number of teams. Although the seeded tournaments have a ρ value closer to -1, if not equal, to the random-draw tournaments in all cases, there is not enough evidence that these correlations exist for the 4 and 7 team double elimination tournaments (since p values > 0.05).

The seeding method presented overall showed a stronger rewarding of skill than random draw.
However, it is more certain to work at higher team numbers.
This is unlike other single elimination seeding methods, which focus on small numbers of teams to perfect fairness due to the difficulty of generalizing seeding methods for higher numbers of teams (Groh, Moldovanu, Sela, and Uwe, 2008; Arlegi and Dimitrov, 2020; Chung and Hwang, 1978; Khatibi, King, and Jacobson, 2015).
Another advantage this seeding method has is functioning for team numbers that are not powers of two. This is not implemented in any single elimination seeding methods the author is aware of besides the one by Hwang (Hwang, 1982).

The automation of this seeding method, however, also has its flaws.
The simulation of the games used skewed random number generation depending on the seed numbers, so that a higher seed has a higher yet predetermined chance of winning a game, which could misrepresent the data and correlations.

In the future, this program could be transformed into a website or an app, with a user interface to input data about who won each game.
Anti-cheating and anti-sandbagging programs from Arnone, Cire, Ross, and Meyerhofer (2019) could be incorporated into the website/app to ensure a fun tournament for all participants (Arnone, Cire, Ross, and Meyerhofer, 2019).
Missing data analysis protocols, such as Maximum Likelihood, could be used in conjunction with a small sample of data – for example, having a person throw 20 horseshoes and record any points and ringers they scored – to give any new player a seeding score to be used (Baraldi and Enders, 2010).
This seeding method could also be expanded to other types of tournaments, such as Round Robin.

In conclusion, this new double elimination seeding method is more effective than random draw but needs more evidence to support this for a lower number of teams.
This opens the seeding field to investigate seeding methods for tournament types other than for single elimination, which could help popularize these other types of tournaments.
The program used to automate the seeding method can be used in tournaments for other backyard sports besides horseshoes, such as cornhole, or pool.
The parameters that the program used to find skill would be adjusted for those sports.

Below are my references in PDF form. They can also be accessed here.

Below is my February Fair Poster and slides, in PDF form. They can also be accessed here.