Clojure

Constructing major system mnemonics with Markov chains

In the previous two posts we constructed a mnemonic phrase using parts of speech and simple grammars. In this final post about generating mnemonics, rather than define grammars and enumerate parts of speech manually, we'll use a more naive approach: a Markov chain.

Here's a simple Markov chain that identifies what words follow each word in Genesis 1:1 of the King James Bible:

(->> kjv
     (sequence
       (comp
         (take 1)
         (map :text)
         (mapcat words)
         (map str/lower-case)))
     (partition 2 1)
     (group-by first)
     (into {}
           (map (fn [[word successors]]
                  [word
                   (set (map second successors))]))))

{"in" #{"the"},
 "the" #{"earth" "heaven" "beginning"},
 "beginning" #{"god"},
 "god" #{"created"},
 "created" #{"the"},
 "heaven" #{"and"},
 "and" #{"the"}}

We can walk a structure like this to generate mnemonic phrases without any awareness of a phrase's grammar, relying instead on the word-patterns embedded in a text. Here's a function based on the above code:

(defn markov [words encoder]
  (->> words
       (partition 2 1)
       (group-by first)
       (into {}
             (map (fn [[word successors]]
                    [word
                     (into #{}
                           (comp
                             (map second)
                             (if (str/blank? (encoder word))
                               (remove #(str/blank? (encoder %)))
                               (comp)))
                           successors)])))))

Notice a bit of added logic determining the successors of a word. Phrases such as "Woe, woe" create a cycle in the graph that doesn't encode any digits and can cause a graph walking algorithm to generate an infinite lists of "woes". The easiest solution is to prevent non-encoding words from following each other, which also avoids runs of non-encoding words that make mnemonics longer without actually helping.

Here's the word graph:

(def markov-kjv
  (markov
    (sequence
      (comp
        (map :text)
        (mapcat words)
        (map str/lower-case))
      kjv)
    major-phoneme-encode))

The algorithm resembles our pattern-based generator from a previous post and uses sequence recursion:

(defn generate-mnemonic [digits encoder starts successors]
  (sequence
    (comp
      (keep (fn [word]
              (let [encoding (encoder word)]
                (when (and encoding (str/starts-with? digits encoding))
                  (let [remaining (subs digits (count encoding))
                        succ (successors word)]
                    (cond
                      (str/blank? remaining)
                      , [[word]]
                      (seq succ)
                      , (map #(cons word %)
                             (generate-mnemonic
                               remaining
                               encoder
                               (shuffle succ)
                               successors))
                      :else
                      , nil))))))
      (mapcat seq)
      (take 5))
    starts))

I've shuffled the order that successor words are searched to add a bit of variety to the results, and I've capped the list to five results for the sake of speed.

(generate-mnemonic
  "1414"
  major-phoneme-encode
  (keys markov-kjv)
  markov-kjv)

(("dare" "to" "hire")
 ("dare" "to" "hoar")
 ("dare" "to" "wear")
 ("dare" "to" "her")
 ("dare" "to" "our"))

"Dare to hire" and "dare to wear" aren't bad. Here are results for our other digit sequences:

Digits	Sample
1414	dare to i hear, dare to i were, dare to i here, dare to i or, dare to i are
1732	dough ye go i my way an, dough ye go i my way when, dough ye go i my own, dough ye go i my way in, dough ye go i mean
2236	hewn in ye me i wish, hewn in ye me i shew, hewn in ye me i watch, hewn in ye me i joy, hewn in ye me she
2646	owe no way which your way shew, owe no way which your joy, owe no way which your way which, owe no way which your yea joy, owe no church
3183	mad why have we may, mad why have we him, mad why have my, mad why have ye me, mad why have me
31830	mad why have me i sigh, mad why have me i say, mad why have me say, mad why have me i use, mad why have me asa

Not all that glitters is gold. While the phrases sometimes resemble 17th century English grammar, and some examples can even be construed as sensible standalone phrases (e.g., "dough, ye come when?", "mad, why have ye him?", "mad, why have my house?"), most of the results are fragmentary or nonsensical, which is more or less what comes out of walking a Markov chain.

Let's see if results improve by factoring in the frequency of successors:

(defn markov-frequencies [words encoder]
  (->> words
       (partition 2 1)
       (group-by first)
       (into {}
             (map (fn [[word successors]]
                    [word
                     (->> successors
                          (into []
                                (comp
                                  (map second)
                                  (if (str/blank? (encoder word))
                                    (remove #(str/blank? (encoder %)))
                                    (comp))))
                          frequencies
                          (sort-by second >)
                          (map first))])))))

(def markov-frequencies-kjv
  (markov-frequencies
    (sequence
      (comp
        (map :text)
        (mapcat words)
        (map str/lower-case))
      kjv)
    major-phoneme-encode))

Using this dictionary, generate-mnemonic produces phrases like this:

Digits	Sample
1414	dare to her, dare to hear, dare to weary, dare to i here, dare to err
1732	dough ye come he any, dough ye come he no, dough ye come he know, dough ye come he now, dough ye come in
2236	hewn in me ye which, hewn in me shew, hewn in me ye shew, hewn in me which, hewn in me ye joy
2646	owe no issue he were i watch, owe no issue he were i wash, owe no issue he were i joy, owe no issue he were i shew, owe no issue he were i wish
3183	mad why have my, mad why have we him, mad why have me, mad why have we may, mad why have him
31830	mad why have him i whose, mad why have him i as, mad why have him i say, mad why have him i saw, mad why have a mess

The results seem to be worse, in particular due to the prevalance of the non-encoding pronouns "you", "ye", and "I", which in some cases show up almost every other word. The variety created by shuffle seems to be a (slightly) better approach.

December 2022