## Insertion-sort algorithm applied to finding maximum sequence score

This explanation clarifies how the well-known insertion-sort algorithm is applied to VoteFair popularity ranking in a way that greatly reduces the number of calculations needed to find maximum sequence scores. (This method is just part of the full algorithm, which is explained in the full calculation details for VoteFair popularity ranking.)

(The algorithm described here assumes that the choices already have been pre-sorted using the Choice-Specific Pairwise-Score (CSPS) algorithm.)

Consider an example in which there are five choices named A, B, C, D, and E, with a final sort order that matches this alphabetical order.

Notation: The notation A>B refers to how many voters pairwise prefer choice A over choice B, and the notation B>A refers to how many voters pairwise prefer choice B over choice A. This notation always uses the "greater-than" symbol ">", and never uses the "less-than" symbol "<".

At an intermediate stage in this sorting example, suppose that the choices A, C, and E have been sorted — into this correct order — and choice B is about to be sorted, and choice D remains unsorted. The pairwise counts for this arrangement are shown below. The asterisks show the separation between sorted choices and unsorted choices.

| | | | | | | A | C | E | B | D | | | | | | | -----*************************-------+-------+ * \ | | * | | A * \ | A>C | A>E * A>B | A>D | * \ | | * | | -----+-------+-------+-------+-------+-------+ * | \ | * | | C * C>A | \ | C>E * C>B | C>D | * | \ | * | | -----+-------+-------+-------+-------+-------+ * | | \ * | | E * E>A | E>C | \ * E>B | E>D | * | | \ * | | -----*************************-------+-------+ | | | | \ | | B | B>A | B>C | B>E | \ | B>D | | | | | \ | | -----+-------+-------+-------+-------+-------+ | | | | | \ | D | D>A | D>C | D>E | D>B | \ | | | | | | \ | -----+-------+-------+-------+-------+-------+

The diagonal line passes through empty cells — that would otherwise represent a choice's comparison with itself, such as A>A.

The diagonal line also is the border between the upper-right triangular area and the lower-left triangular area. The sequence score for the current sequence is the sum of all the pairwise counts in the upper-right triangular area (currently A>C + A>E + A>B + A>D + C>E + C>B + C>D + E>B + E>D + B>D).

The goal of these calculations is to find the maximum sequence score, which means the goal is to change the sequence so that the largest pairwise counts move into the upper-right triangular area, leaving the smallest pairwise counts in the lower-left triangular area.

The first step in sorting choice B is to consider the possibility of moving it to the left of choice E, which would form the sequence A, C, B, E. Here is the pairwise-count matrix for this sequence. The asterisks now include choice B because this is a possible sort order.

| | | | | | | A | C | B | E | D | | | | | | | -----*********************************-------+ * \ | | | * | A * \ | A>C | A>B | A>E * A>D | * \ | | | * | -----*-------+-------+-------+-------*-------+ * | \ | | * | C * C>A | \ | C>B | C>E * C>D | * | \ | | * | -----*-------+-------+-------+-------*-------+ * | | \ | * | B * B>A | B>C | \ | B>E * B>D | * | | \ | --- * | -----*-------+-------+-------+-------*-------+ * | | | \ * | E * E>A | E>C | E>B | \ * E>D | * | | --- | \ * | -----*********************************-------+ | | | | | \ | D | D>A | D>C | D>B | D>E | \ | | | | | | \ | -----+-------+-------+-------+-------+-------+

The only pairwise counts that crossed the diagonal line are the (underlined) counts B>E and E>B, which swapped places. All the other pairwise counts that move do not cross the diagonal line; they stay on the same side of the diagonal line.

As a result, the score for this sequence, compared to the score for the previous sequence, increases (or decreases if negative) by the amount B>E minus E>B. In this case (assuming there are no complications that are explained later) the sequence score has increased because in the final (alphabetical) sort order, choice B appears before choice E.

The next step in sorting choice B is to consider the possibility of moving it to the left of choice C, which would form the sequence A, B, C, E. Here is the pairwise-count matrix for this sequence.

| | | | | | | A | B | C | E | D | | | | | | | -----*********************************-------+ * \ | | | * | A * \ | A>B | A>C | A>E * A>D | * \ | | | * | -----*-------+-------+-------+-------*-------+ * | \ | | * | B * B>A | \ | B>C | B>E * B>D | * | \ | --- | * | -----*-------+-------+-------+-------*-------+ * | | \ | * | C * C>A | C>B | \ | C>E * C>D | * | --- | \ | * | -----*-------+-------+-------+-------*-------+ * | | | \ * | E * E>A | E>B | E>C | \ * E>D | * | | | \ * | -----*********************************-------+ | | | | | \ | D | D>A | D>B | D>C | D>E | \ | | | | | | \ | -----+-------+-------+-------+-------+-------+

The only pairwise counts that crossed the diagonal line are the (underlined) counts B>C and C>B, which swapped places. The other pairwise counts that moved remained on the same side of the diagonal line.

The score for this sequence increases (or decreases if negative) by the amount B>C minus C>B. In this case the sequence score has increased because (in the final alphabetical order) choice B appears before choice C.

The final step in sorting choice B is to consider the possibility of moving it to the left of choice A, which would form the sequence B, A, C, E. Here is the matrix for this sequence.

| | | | | | | B | A | C | E | D | | | | | | | -----*********************************-------+ * \ | | | * | B * \ | B>A | B>C | B>E * B>D | * \ | --- | | * | -----*-------+-------+-------+-------*-------+ * | \ | | * | A * A>B | \ | A>C | A>E * A>D | * --- | \ | | * | -----*-------+-------+-------+-------*-------+ * | | \ | * | C * C>B | C>A | \ | C>E * C>D | * | | \ | * | -----*-------+-------+-------+-------*-------+ * | | | \ * | E * E>B | E>A | E>C | \ * E>D | * | | | \ * | -----*********************************-------+ | | | | | \ | D | D>B | D>A | D>C | D>E | \ | | | | | | \ | -----+-------+-------+-------+-------+-------+

The only pairwise counts that crossed the diagonal line are the (underlined) counts B>A and A>B, which swapped places. The other pairwise counts that moved remained on the same side of the diagonal line.

The score for this sequence increases (or decreases if negative) by the amount B>A minus A>B. In this case the sequence score has decreased because (in the final alphabetical order) choice B appears after, not before, choice A.

At this point choice B has been tested at each position within the sorted portion. The maximum sequence score (for the sorted portion) occurred when it was between choices A and C. As a result, choice B will be moved to the position between choices A and C

Notice that the full sequence score did not need to be calculated in order to find this "local" maximum. These calculations only need to keep track of increases and decreases that occur as the being-sorted choice swaps places with successive already-sorted choices.

The pairwise-count matrix with choice B in the second sort-order position (between A and C) is shown below. Now choice D is the only unsorted choice.

| | | | | | | A | B | C | E | D | | | | | | | -----*********************************-------+ * \ | | | * | A * \ | A>B | A>C | A>E * A>D | * \ | | | * | -----*-------+-------+-------+-------*-------+ * | \ | | * | B * B>A | \ | B>C | B>E * B>D | * | \ | | * | -----*-------+-------+-------+-------*-------+ * | | \ | * | C * C>A | C>B | \ | C>E * C>D | * | | \ | * | -----*-------+-------+-------+-------*-------+ * | | | \ * | E * E>A | E>B | E>C | \ * E>D | * | | | \ * | -----*********************************-------+ | | | | | \ | D | D>A | D>B | D>C | D>E | \ | | | | | | \ | -----+-------+-------+-------+-------+-------+

Choice D would be sorted in the same way. Of course the maximum sequence score would occur when choice D is between choices C and E, so D is moved there.

| | | | | | | A | B | C | D | E | | | | | | | -----***************************************** * \ | | | | * A * \ | A>B | A>C | A>D | A>E * * \ | | | | * -----*-------+-------+-------+-------+-------* * | \ | | | * B * B>A | \ | B>C | B>D | B>E * * | \ | | | * -----*-------+-------+-------+-------+-------* * | | \ | | * C * C>A | C>B | \ | C>D | C>E * * | | \ | | * -----*-------+-------+-------+-------+-------* * | | | \ | * D * D>A | D>B | D>C | \ | D>E * * | | | \ | * -----*-------+-------+-------+-------+-------* * | | | | \ * E * E>A | E>B | E>C | E>D | \ * * | | | | \ * -----*****************************************

Now there are no more choices to sort, so the resulting sequence is A, B, C, D, E. In this sequence the full sequence score — which equals A>B + A>C + A>D + A>E + B>C + B>D + B>E + C>D + C>E + D>E — is likely to be the highest possible sequence score.

Additional calculations, which are described in the full calculation details for VoteFair popularity ranking, are needed because in rare cases it is possible that moving two or more choices at the same time could produce a higher sequence score. This concept is analogous to climbing a mountain in foggy conditions by always heading in the locally higher direction and ending up at the top of a peak and then, when the fog clears, seeing a higher peak.

The VoteFair ranking calculation methods are in the public domain, but copyright protection (see below) applies to the above description and to the software that implements VoteFair ranking.