exupero's blog

Constructing major system mnemonics with grammar

In the previous three posts we searched for digit encodings in a body of text, but few if any of the results were self-contained or especially memorable. A good mnemonic usually relies on visual memory by creating an absurd mental picture that's hard to forget, and three or four words from an arbitrary text are unlikely to do that. Instead of searching for mnemonics, let's generate them.

I'll encode words using the phonetic major system and the CMU pronunciation dictionary from the previous post. To create a grammatically correct phrase, we can use a dictionary that includes parts of speech. Here's one option, which I've downloaded and parsed into a two-column CSV.

To start, we'll only collect nouns and verbs. We'll also limit the dictionary to words that only occur as one part of speech, to avoid some of the confusion of English words that can be used in several different ways.

(def parts-of-speech
  {"n." #{:noun}
   "n.." #{:noun}
   "v." #{:verb}
   "a. & v." #{:verb}
   "v. i." #{:verb}
   "v. t." #{:verb}})
(def dictionary
  (->> (line-seq words)
           (map #(str/split % #"," 2))
           (remove (comp str/blank? second))
           (mapcat (fn [[word pos]]
                     (when-let [encoding (major-phoneme-encode word)]
                       (map #(do [% encoding word])
                            (parts-of-speech pos #{pos})))))
       (group-by #(nth % 2))
           (filter #(= 1 (count (second %))))
           (map (comp first second))
           (filter (comp keyword? first))))))

This dictionary, which also only includes words that are also in the CMU pronunciation dictionary, has about 13,000 entries—good enough for our purposes.

We'll generate words matching a given pattern recursively, but unlike the previous posts, we'll use functional recursion rather than tail-call recursion. When we find a word that matches the start of a sequence of digits, we'll call the function within itself and pass the remaining digits and the remaining parts of speech:

(defn generate-mnemonic [digits pattern words]
  (let [[type & pattern] pattern]
        (keep (fn [[part-of-speech encoding word]]
                (when (and (= part-of-speech type)
                           (str/starts-with? digits encoding))
                    (seq pattern)
                    , (map #(cons word %)
                             (subs digits (count encoding))
                    (str/blank? (subs digits (count encoding)))
                    , [[word]]
                    , nil))))
        (mapcat seq))

Here's a sample of matches for the digit sequences we've been looking for:

1414244Thor Adhere, Dowry Adhere, Trio Adore, Otto Redraw, Dorado Err
173218Teague Moan, Dogma Gnaw, Outcome Gnaw, Headache Moan, Thug Moan
223644Nun Meech, Nun Mich, Ennui Enmesh, Yuen Enmesh, Ano Enmesh
264626Ney Cherish, Yuen Cherish, Wain Cherish, Noah Cherish, Hun Cherish
31831Motif Aim
318303Motif Amuse, Motif Mosey, Motif Amaze

There are definitely some images with potential here (e.g., "Thor adhere", "headache moan"), and possibly more that aren't listed in the samples.

I've included 31830 from John Cook's post, which in my previous posts I truncated to four digits since that's all we could find in the KJV. When generating phrases, though, not only can we generate terms for the longer sequence of digits, sometimes the extra digits help, such as here where the zero lets us use longer words.

To find a longer sequence of digits, say, the first ten digits of the golden ratio, we need to provide a longer pattern:

(generate-mnemonic "1618033988" [:noun :noun :verb :noun] dictionary)
(("Yiddish" "Atavism" "Imbue" "Fief")
 ("Tuch" "Atavism" "Imbue" "Fief")
 ("Adage" "Atavism" "Imbue" "Fief")
 ("Doge" "Atavism" "Imbue" "Fief")
 ("Thatch" "Atavism" "Imbue" "Fief")
 ("Duchy" "Atavism" "Imbue" "Fief"))

"Thatch atavism imbue fief" is obscure but almost sensible.

I used noun pairs as a stand-in for adjective-noun pairs, which works well enough in English, though of course we could also include adjectives and adverbs from the parts of speech dictionary in our generating dictionary. As further polish we could match noun and verb plurality.