Purpose of VoteFair popularity ranking
VoteFair popularity ranking identifies the most popular choice, the second-most popular choice, the third-most popular choice, and so on down to the least popular choice.
Calculation details for VoteFair popularity ranking
The VoteFair popularity ranking method is described in the VoteFair ranking Wikipedia article (which redirects to the article about the mathematically equivalent "Kemeny-Young method"). The VoteFair marble machine entertainingly demonstrates VoteFair popularity ranking for three choices.
Briefly, here are the main points about VoteFair popularity ranking:
- The ballots are counted using pairwise counting. This counting method is described in the Wikipedia article
and it is demonstrated in the
Pairwise Counting Skit.
- The method considers every possible sequence of choices in terms of which choice might be most popular, which choice might be second-most popular, and so on down to which choice might be least popular.
- Each such sequence is associated with a sequence score that is equal to the sum of the pairwise counts that apply to the specified sequence.
(As explained below, these sequence scores do not need to be calculated.)
- The sequence with the highest score is identified as the overall ranking, from most popular to least popular.
Another way to understand VoteFair popularity ranking is to look at a matrix of pairwise counts, as shown below, and consider that the goal is to maximize the sum of the pairwise counts in the upper-right, triangular half of the matrix (which minimizes the sum of the pairwise counts in the lower-left half).
|... B||... C||... E||... F||... A||... D|
|B preferred over ...||—||56||65||74||79||84|
|C preferred over ...||37||—||60||70||75||76|
|E preferred over ...||29||35||—||64||66||72|
|F preferred over ...||19||23||30||—||55||62|
|A preferred over ...||14||18||28||37||—||50|
|D preferred over ...||9||17||21||30||41||—|
Although the goal of VoteFair popularity ranking is to find the sequence that has the highest sequence score, the scores themselves do not need to be calculated. This concept is similar to not needing to know the elevation of every point in a region in order to know that a specific mountain peak is the highest elevation in that region. By not calculating all the sequence scores, the calculation time can be greatly reduced compared to the method that calculates all the sequence scores.
Full calculation method for VoteFair popularity ranking
This is a description of the full algorithm used to calculate VoteFair popularity ranking results.
The algorithm begins by calculating the Choice-Specific Pairwise-Score (CSPS) ranking. This pre-sort is a required part of the process. Without it, some unusual cases can cause the calculations to fail to find the sequence with the highest score. This pre-sort is analogous to starting a search for the highest mountain peak within a mountain range instead of starting the search within a large valley.
The next step is to apply the VoteFair-specific variation of the insertion-sort method, including starting at the left/highest end.
To ensure that all possible moves of each choice are considered, the insertion-sort method is done in both directions. Sorting in both directions means that in some sorting passes sorting moves choices to the left. In other sorting passes sorting starts by considering the right-most choice as the first sorted choice, and choices move to the right, into the sorted portion. This convention ensures movement for choices that need to move right, instead of left, in order to cause an increase in the score.
Complications can arise when there is "circular ambiguity", so additional steps are used. The most common cases of circular ambiguity involve several choices that are tied for the same sort-order position.
A key part of dealing with circular ambiguity is to follow this convention: whenever a choice can produce the same, highest, sequence score at more than one position, the choice is moved to the farthest of those highest-sequence-score positions.
Another part of dealing with these complications is to sort the sequence multiple times.
During the second sorting pass, if there is no circular ambiguity, the sequence of the choices in the pairwise matrix remains the same. This lack of movement (when there is no circular ambiguity) occurs because the sorted and unsorted portions are adjacent. Specifically, each choice to be sorted is already at the top (for left-movement sorting) or bottom (for right-movement sorting) of the "unsorted" portion, and it is being moved to the bottom (for left-movement sorting) or top (for right-movement sorting) of the "sorted" portion. In such cases the only thing that moves is the boundary between the sorted choices and unsorted choices.
However, in cases that involve circular ambiguity, the positions of some choices will change during the second and later sorting passes. This happens because the convention (as explained above) is to move each choice as far as it will go, within the limits of maximizing the sequence score.
During the sorting passes the highest sort-order (sequence) position of each choice is tracked, and the lowest sort-order position of each choice is tracked. These highest and lowest positions are reset (to current positions) whenever the sequence score increases to a higher score. At the end of the sorting process the highest and lowest positions reveal which choices are tied at the same popularity ranking level.
Using the insertion-sort example, if choices B, C, and D can be in any order and still produce the same highest sequence score, then each of these choices would move to the left of the other two each time it is sorted, and each of these choices would have the same highest-ranked position of second place, and each would have the same lowest-ranked position of fourth place. Because these three choices have the same highest and lowest positions, they are easily identified as tied (at the same popularity ranking).
More complex cases of circular ambiguity can occur. To deal with these cases, and to ensure the correctness of the "winner" (the most popular choice), the sorting process is repeated for the top half (plus one) of the highest-ranked choices, and this sub-set sorting is repeated until there are just three choices. For example, if there are 12 choices, the sorting process is done for 12 choices, then the top 7 choices, then the top 4 choices, and finally the top 3 choices. Then the highest-ranked choice (or the choices that are tied at the top) is kept at the highest rank while the other choices are sorted a final time. (If, instead, the least-popular choice is the most important one to identify correctly, the data supplied to this algorithm can be inverted according to preference levels, and then the calculated ranking can be reversed.)
As a clarification, the extra sub-set sorting is done only if more than one sequence has the same highest sequence score. This point is significant if the distinction between VoteFair popularity ranking and the Condorcet-Kemeny method is relevant. Specifically, the Condorcet-Kemeny method does not indicate how such "tied" sequence scores should be resolved, whereas VoteFair popularity ranking resolves such "tied" sequence scores as part of its calculation process.
After all the sorting has been done, the highest and lowest ranking levels are used to determine the results. For each choice its highest and lowest ranking levels are added together (which equals twice their average) and then multiplied times a constant. The constant equals 10 times the number of choices minus one. These numbers are converted to integers, and then these "averaged scaled integerized" values are used as the non-normalized ranking levels. Two or more choices are ranked at the same level if they have the same "averaged-scaled-integerized" ranking values.
The final calculation step is to normalize the "averaged-scaled-integerized" ranking levels so that the normalized ranking levels are consecutive, namely 1, 2, 3, etc. (so that no ranking levels are skipped).
The result is a ranking that identifies which choice is first-most popular, which choice is second-most popular, and so on down to which choice is least popular. Ties can occur at any level.
The full algorithm used to calculate VoteFair popularity ranking results has a calculation time that is less than or equal to the following polynomial function:
T = A + ( B * N ) + ( C * ( N * N ) )
where T is the calculation time, N is the number of choices, and A and B and C are constants. (In mathematical notation, N * N would be written as N squared.) This function includes the calculation time required for the Choice-Specific Pairwise-Score (CSPS) pre-sort calculations.
This calculation time contrasts with the slow execution times of the "NP-hard" approach, in which every sequence score is calculated in order to find the sequence with the highest score. If every sequence score were calculated (from scratch), the calculation time would be proportional to:
N! * N * N
where N is the number of choices, N! is N factorial (2 * 3 * 4 * ... * N), and N * N equals N squared. Note that N factorial equals the number of possible sequences, and N squared times one-half approximately equals the number of pairwise counts that are added to calculate each sequence score.
This clarification about calculation time is included because there is an academically common — yet mistaken — belief that calculating the "Condorcet-Kemeny method" is "NP-hard" and cannot be calculated in a time that is proportional to a polynomial function of N (the number of choices).
The VoteFair ranking calculation methods are in the public domain, but copyright protection (see below) applies to the above description and to the software that implements VoteFair ranking.