exupero's blog

# Constructing major system mnemonics with Markov chains

In the previous two posts we constructed a mnemonic phrase using parts of speech and simple grammars. In this final post about generating mnemonics, rather than define grammars and enumerate parts of speech manually, we'll use a more naive approach: a Markov chain.

Here's a simple Markov chain that identifies what words follow each word in Genesis 1:1 of the King James Bible:

``````(->> kjv
(sequence
(comp
(take 1)
(map :text)
(mapcat words)
(map str/lower-case)))
(partition 2 1)
(group-by first)
(into {}
(map (fn [[word successors]]
[word
(set (map second successors))]))))``````
``````{"in" #{"the"},
"the" #{"earth" "heaven" "beginning"},
"beginning" #{"god"},
"god" #{"created"},
"created" #{"the"},
"heaven" #{"and"},
"and" #{"the"}}
``````

We can walk a structure like this to generate mnemonic phrases without any awareness of a phrase's grammar, relying instead on the word-patterns embedded in a text. Here's a function based on the above code:

``````(defn markov [words encoder]
(->> words
(partition 2 1)
(group-by first)
(into {}
(map (fn [[word successors]]
[word
(into #{}
(comp
(map second)
(if (str/blank? (encoder word))
(remove #(str/blank? (encoder %)))
(comp)))
successors)])))))``````

Notice a bit of added logic determining the successors of a word. Phrases such as "Woe, woe" create a cycle in the graph that doesn't encode any digits and can cause a graph walking algorithm to generate an infinite lists of "woes". The easiest solution is to prevent non-encoding words from following each other, which also avoids runs of non-encoding words that make mnemonics longer without actually helping.

Here's the word graph:

``````(def markov-kjv
(markov
(sequence
(comp
(map :text)
(mapcat words)
(map str/lower-case))
kjv)
major-phoneme-encode))``````

The algorithm resembles our pattern-based generator from a previous post and uses sequence recursion:

``````(defn generate-mnemonic [digits encoder starts successors]
(sequence
(comp
(keep (fn [word]
(let [encoding (encoder word)]
(when (and encoding (str/starts-with? digits encoding))
(let [remaining (subs digits (count encoding))
succ (successors word)]
(cond
(str/blank? remaining)
, [[word]]
(seq succ)
, (map #(cons word %)
(generate-mnemonic
remaining
encoder
(shuffle succ)
successors))
:else
, nil))))))
(mapcat seq)
(take 5))
starts))``````

I've shuffled the order that successor words are searched to add a bit of variety to the results, and I've capped the list to five results for the sake of speed.

``````(generate-mnemonic
"1414"
major-phoneme-encode
(keys markov-kjv)
markov-kjv)``````
``````(("dare" "to" "hire")
("dare" "to" "hoar")
("dare" "to" "wear")
("dare" "to" "her")
("dare" "to" "our"))
``````

"Dare to hire" and "dare to wear" aren't bad. Here are results for our other digit sequences:

DigitsSample
1414dare to i hear, dare to i were, dare to i here, dare to i or, dare to i are
1732dough ye go i my way an, dough ye go i my way when, dough ye go i my own, dough ye go i my way in, dough ye go i mean
2236hewn in ye me i wish, hewn in ye me i shew, hewn in ye me i watch, hewn in ye me i joy, hewn in ye me she
2646owe no way which your way shew, owe no way which your joy, owe no way which your way which, owe no way which your yea joy, owe no church
31830mad why have me i sigh, mad why have me i say, mad why have me say, mad why have me i use, mad why have me asa

Not all that glitters is gold. While the phrases sometimes resemble 17th century English grammar, and some examples can even be construed as sensible standalone phrases (e.g., "dough, ye come when?", "mad, why have ye him?", "mad, why have my house?"), most of the results are fragmentary or nonsensical, which is more or less what comes out of walking a Markov chain.

Let's see if results improve by factoring in the frequency of successors:

``````(defn markov-frequencies [words encoder]
(->> words
(partition 2 1)
(group-by first)
(into {}
(map (fn [[word successors]]
[word
(->> successors
(into []
(comp
(map second)
(if (str/blank? (encoder word))
(remove #(str/blank? (encoder %)))
(comp))))
frequencies
(sort-by second >)
(map first))])))))``````
``````(def markov-frequencies-kjv
(markov-frequencies
(sequence
(comp
(map :text)
(mapcat words)
(map str/lower-case))
kjv)
major-phoneme-encode))``````

Using this dictionary, `generate-mnemonic` produces phrases like this:

DigitsSample
1414dare to her, dare to hear, dare to weary, dare to i here, dare to err
1732dough ye come he any, dough ye come he no, dough ye come he know, dough ye come he now, dough ye come in
2236hewn in me ye which, hewn in me shew, hewn in me ye shew, hewn in me which, hewn in me ye joy
2646owe no issue he were i watch, owe no issue he were i wash, owe no issue he were i joy, owe no issue he were i shew, owe no issue he were i wish
The results seem to be worse, in particular due to the prevalance of the non-encoding pronouns "you", "ye", and "I", which in some cases show up almost every other word. The variety created by `shuffle` seems to be a (slightly) better approach.