Clojure

Finding mnemonics in a body of text

John Cook wrote about mnemonics for the first few decimal places of the square roots of 2, 3, 5, and 7, as well as a mnemonic for the reciprocal of π. The mnemonics provide a sequence of words in which each word has the length of its respective digit. For example, "I wish I knew" represents the digits 1414, which helps one remember that the square root of two is 1.414.

The rhymes in John's post are catchy, but for less common integer sequences it may be too much trouble to construct a phrase with the right length words. An alternative is to find a phrase you already know from a familiar body of text.

To do so, we first need to parse text into words:

(defn words [text]
  (re-seq #"[A-Za-z'-]+" text))

I've chosen to include apostrophes and hyphens as part of the word, but you're free to choose differently.

How to encode a word requires some additional choices, specifically for words with more than nine characters. We could encode those words using only the digit in the ones column (i.e., (mod (count word) 10)), or we could use both digits. I'll use both digits, in hopes of making digit pairs with "1" more common than one-letter words, but I'll make an exception for ten-character words, which I'll encode as "0" to make digit pairs ending with "0" more common:

(defn word->digits [word]
  (let [c (count word)]
    (if (= c 10)
      "0"
      (str c))))

Let's check this code against John's mnemonics:

for the square root of 2, 1.414

(map word->digits (words "I wish I knew"))

("1" "4" "1" "4")

for the square root of 3, 1.732

(map word->digits (words "O charmed was he"))

("1" "7" "3" "2")

for the square root of 5, 2.326

(map word->digits (words "So now we strive"))

("2" "3" "2" "6")

for the square root of 7, 2.646

(map word->digits (words "It rhymes with heaven"))

("2" "6" "4" "6")

for the reciprocal of π, 0.31830

(map word->digits (words "Can I remember the reciprocal?"))

("3" "1" "8" "3" "0")

When searching a large body of text, we can check whether a digit encoding contains a sequence of digits by turning the words encoded as digits into a single string and using .indexOf:

(defn match [digits text]
  (let [ws (words text)
        ds (transduce (map word->digits) str ws)
        i (.indexOf ds digits)]
    (when-not (neg? i)
      i)))

This will give us the index of the match, and if each word was represented by only one digit, the index would be enough to find the words that encode the sequence of digits. Since I've chosen to have long words encode as two digits, we'll need a more complex function to find the actual words that represent the digits we want:

(defn find-words [digits words]
  (loop [words words
         ds digits
         match []]
    (cond
      (str/blank? ds) match
      (empty? words) nil
      :else (let [[word & words] words
                  c (word->digits word)]
              (cond
                (str/starts-with? ds c)
                , (recur words
                         (subs ds (count c))
                         (conj match word))
                (seq match)
                , (recur (concat (drop 1 match) [word] words)
                         digits
                         [])
                :else
                , (recur words digits []))))))

The function above loops over the list of words and tries to match against a string of digits. If the list of digits is blank, we've matched all of them; if the list of words is empty, we've run out of words before finding a match. Otherwise, we check each word to see if it encodes the start of the string of digits. If it does, we add the word to our tentative match, but if it doesn't we reset the match. Note that if we reset after we've already started building a possible match, we have to backtrack and add all but the first word of the tentative match onto the start of the list of words.

Let's see what we find when the words we want are embedded in a longer list of words:

(find-words "1414" (words "Those are things I wish I knew more about."))

["I" "wish" "I" "knew"]

Now we can search a larger body of text. Here's a function that finds the first match in the King James translation of the Bible:

(defn search-kjv [digits]
  (let [digits (str digits)]
    (some (fn [{:keys [text] :as verse}]
            (when (match digits text)
              (when-let [ws (find-words digits (words text))]
                [ws verse])))
          kjv)))

(search-kjv 1414)

[["I" "even" "I" "will"]
 {:verse "Leviticus 26:28",
  :text
  "Then I will walk contrary unto you also in fury; and I, even I, will chastise you seven times for your sins."}]

Here are the matches found in the King James Version for the four-digit sequences in John's post:

Digits	Text	Verse
1414	Then I will walk contrary unto you also in fury; and I, even I, will chastise you seven times for your sins.	Leviticus 26:28
1732	Now therefore restore the man his wife; for he is a prophet, and he shall pray for thee, and thou shalt live: and if thou restore her not, know thou that thou shalt surely die, thou, and all that are thine.	Genesis 20:7
2236	And God blessed the seventh day, and sanctified it: because that in it he had rested from all his work which God created and made.	Genesis 2:3
2646	In the six hundredth year of Noah’s life, in the second month, the seventeenth day of the month, the same day were all the fountains of the great deep broken up, and the windows of heaven were opened.	Genesis 7:11
3183	And I will give unto thee, and to thy seed after thee, the land wherein thou art a stranger, all the land of Canaan, for an everlasting possession; and I will be their God.	Genesis 17:8

I couldn't find a match for all five decimal places of the reciprocal of π, but matching four digit sequences seems to be relatively easy, evidenced by four out of five being found in just the first 20 chapters of Genesis.

Admittedly, these are terrible mnemonics. Matches start in the middle of sentences, extend into other clauses, and quit unexpectedly. It's tough to know which words represent the actual digits you're trying to remember. It'd be easier just to remember the actual digits. Possibly a complete search of matching verses could surface a few memorable phrases, but given that John's mnemonics rely on poetic grammar (e.g. "O charmed was he"), the odds may be slim of finding good standalone phrases in a text not intended to be encoded this way.

In the next post we'll look at a different digit encoding scheme.

December 2022