August 4, 2022
5 min read▪︎

Five Five-letter Words with Twenty-Five Unique Letters

My “giving-it-a-go” on an a puzzle by Matt Parker.

The Problem

The puzzle was originally mentioned in this video by Matt Parker.

The goal is to find a set of five five-letter words that have twenty-five unique letters. I thought the puzzle was interesting enough to try to solve it myself, but when Parker said that his solution took a whopping 31.95 days to compute, I got nerd sniped. There has got to be a faster way!

Parker used a word list of ~340,000 English wordsWhile this list does not include every word, it is good enough for this purpose.. As seen in the video, some of the words in the word list were questionable at best. There were words such as FLDXT, HDQRS, EXPWY, (short for fluid extraxt, headquarters, and expressway). Due to this, Parker later decided to use the word list of Wordle as an authoritative source of proper words. I will be using the original word list of 340,000 words.

My Solution

My solution is JavaScript-based (full code in the Appendix). I first downloaded the same word list that Parker used, and loaded it into a JavaScript array. The first step is to filter out all words that are not five letters long. Next, any words that have duplicate characters in them, e.g. CLASS, need to be filtered out as well, as they break the rules of the puzzle.

index.js
const fs = require("fs");

// Read the word list into an array
let words = fs.readFileSync("words-alpha.txt", "utf8").split("\n");

// Remove all characters that are not 5 letters long
words = words.filter((word) => word.length === 5);

// Remove words that have duplicate characters
words = words.filter((word) => new Set(word).size === word.length);

To make the algorithm run faster, a couple of things can be optimized here. The first optimization idea is to remove redundant anagram words, as Parker did in the video. The word list contains multiple anagrams, e.g. study is equivalent to dusty in terms of characters used. Duplicate anagram words can be removed by creating an index, where each word is transformed by sorting its letters in an alphabetical order. For example, the word SLATE becomes AELST, etc. These sorted will be used to find all anagram words. For example, dusty will have the same index as study (dstuy).

The original length of the word list was 12,972 five-letter words. After removing all words with duplicate characters, and redundant anagram words, the length of the word list was reduced to 5,182. This will already make the task a bit faster to solve.

The second optimization idea is to represent the words as bitsets. As there are only 26 letters in the English alphabet, we can think of each letter as a number between 0 and 25, a to z. Bitset operations are significantly faster than string operations. For example, instead of checking if two strings share common characters by iterating through every character in both strings, we can now simply check if the two bitsets intersect. For bitset operations, I used the FastBitSet.js library.

Creating the word index
const FastBitSet = require("fastbitset");

const wordIndex = [];

const stringToNumberArray = (s) => {
  const arr = [];
  for (const char of s) {
    arr.push(char.charCodeAt(0) - 97);
  }
  return arr;
};

for (const word of words) {
  wordIndex.push({
    word,
    index: word.split("").sort().join(""),
    bitIndex: new FastBitSet(stringToNumberArray(word)),
  });
}
Removing anagram words
const indexValues = wordIndex.map((word) => word.index);
const indexSet = [...new Set(indexValues)];

const uniqueIndex = [];

for (const word of wordIndex) {
  if (
    indexSet.includes(word.index) &&
    !uniqueIndex.find((item) => item.index === word.index)
  ) {
    uniqueIndex.push(word);
  }
}

The Algorithm

A recursion-based algorithm is used to find any possible sequences.

// Filter out words from the index that have common characters with the sequence
const filterIndex = (sequence, index) => {
  return index.filter((item) => {
    const itemIDX = item.bitIndex;
    return !itemIDX.intersects(sequence.bitIndex);
  });
};

const findWords = (sequence, index) => {
  if (sequence.word.length === 25) {
    foundSequences.push(sequence.word);
  }

  const searchSpace = filterIndex(sequence, index);

  for (let i = 0; i < searchSpace.length; i++) {
    findWords(
      {
        word: sequence.word + searchSpace[i].word,
        bitIndex: sequence.bitIndex.new_union(searchSpace[i].bitIndex),
      },
      searchSpace.slice(i + 1)
    );
  }
};

for (let i = 0; i < uniqueIndex.length; i++) {
  console.log(`${i + 1}/${uniqueIndex.length}`);
  findWords(uniqueIndex[i], uniqueIndex.slice(i + 1));
}

The algorithm iterates through every word in uniqueIndex. For each word, it calls findWords, a recursive function that finds any valid sequences recursively. findWords takes two parameters: the current sequence of words, and the remaining words in the index. The slice method is used to remove the current – and already-tried words from the index, as there is no need to try them again with other combinations.

At each step, findWords filters the remaining words in the index by removing any words that have common characters with the current sequence by using the filterIndex function. filterIndex checks if the bitset of the sequence and the bitset of the word interset. If they do, they share common bits, that is, letters.

If the length of the sequence gets to 25, a valid sequence has been discovered. Valid sequences are added to the foundSequences array.

Benchmark

My solution took 53.6 seconds to run on a 2018 MacBook Pro. All in all, 538 unique sequences were found (including fjord – gucks – nymph – vibex – waltz that Parker found in the video). However, it is worth pointing out that due to the removal of anagram words, there exist more than 538 unique sequences. However, it is trivial to recover these missing sequences, as we still have the original wordIndex that has not been filtered.

Comparing this to the original running time of 31.95 days, the importance of optimization becomes evident.

Appendix

Execute the code by running the following commands in the terminal.

Terminal
npm install
node index.js