In the previous two posts we constructed a mnemonic phrase using parts of speech and simple grammars. In this final post about generating mnemonics, rather than define grammars and enumerate parts of speech manually, we'll use a more naive approach: a Markov chain.
Here's a simple Markov chain that identifies what words follow each word in Genesis 1:1 of the King James Bible:
(->> kjv
(sequence
(comp
(take 1)
(map :text)
(mapcat words)
(map str/lower-case)))
(partition 2 1)
(group-by first)
(into {}
(map (fn [[word successors]]
[word
(set (map second successors))]))))
{"in" #{"the"}, "the" #{"earth" "heaven" "beginning"}, "beginning" #{"god"}, "god" #{"created"}, "created" #{"the"}, "heaven" #{"and"}, "and" #{"the"}}
We can walk a structure like this to generate mnemonic phrases without any awareness of a phrase's grammar, relying instead on the word-patterns embedded in a text. Here's a function based on the above code:
(defn markov [words encoder]
(->> words
(partition 2 1)
(group-by first)
(into {}
(map (fn [[word successors]]
[word
(into #{}
(comp
(map second)
(if (str/blank? (encoder word))
(remove #(str/blank? (encoder %)))
(comp)))
successors)])))))
Notice a bit of added logic determining the successors of a word. Phrases such as "Woe, woe" create a cycle in the graph that doesn't encode any digits and can cause a graph walking algorithm to generate an infinite lists of "woes". The easiest solution is to prevent non-encoding words from following each other, which also avoids runs of non-encoding words that make mnemonics longer without actually helping.
Here's the word graph:
(def markov-kjv
(markov
(sequence
(comp
(map :text)
(mapcat words)
(map str/lower-case))
kjv)
major-phoneme-encode))
The algorithm resembles our pattern-based generator from a previous post and uses sequence recursion:
(defn generate-mnemonic [digits encoder starts successors]
(sequence
(comp
(keep (fn [word]
(let [encoding (encoder word)]
(when (and encoding (str/starts-with? digits encoding))
(let [remaining (subs digits (count encoding))
succ (successors word)]
(cond
(str/blank? remaining)
, [[word]]
(seq succ)
, (map #(cons word %)
(generate-mnemonic
remaining
encoder
(shuffle succ)
successors))
:else
, nil))))))
(mapcat seq)
(take 5))
starts))
I've shuffled the order that successor words are searched to add a bit of variety to the results, and I've capped the list to five results for the sake of speed.
(generate-mnemonic
"1414"
major-phoneme-encode
(keys markov-kjv)
markov-kjv)
(("dare" "to" "hire") ("dare" "to" "hoar") ("dare" "to" "wear") ("dare" "to" "her") ("dare" "to" "our"))
"Dare to hire" and "dare to wear" aren't bad. Here are results for our other digit sequences:
Digits | Sample |
---|---|
1414 | dare to i hear, dare to i were, dare to i here, dare to i or, dare to i are |
1732 | dough ye go i my way an, dough ye go i my way when, dough ye go i my own, dough ye go i my way in, dough ye go i mean |
2236 | hewn in ye me i wish, hewn in ye me i shew, hewn in ye me i watch, hewn in ye me i joy, hewn in ye me she |
2646 | owe no way which your way shew, owe no way which your joy, owe no way which your way which, owe no way which your yea joy, owe no church |
3183 | mad why have we may, mad why have we him, mad why have my, mad why have ye me, mad why have me |
31830 | mad why have me i sigh, mad why have me i say, mad why have me say, mad why have me i use, mad why have me asa |
Not all that glitters is gold. While the phrases sometimes resemble 17th century English grammar, and some examples can even be construed as sensible standalone phrases (e.g., "dough, ye come when?", "mad, why have ye him?", "mad, why have my house?"), most of the results are fragmentary or nonsensical, which is more or less what comes out of walking a Markov chain.
Let's see if results improve by factoring in the frequency of successors:
(defn markov-frequencies [words encoder]
(->> words
(partition 2 1)
(group-by first)
(into {}
(map (fn [[word successors]]
[word
(->> successors
(into []
(comp
(map second)
(if (str/blank? (encoder word))
(remove #(str/blank? (encoder %)))
(comp))))
frequencies
(sort-by second >)
(map first))])))))
(def markov-frequencies-kjv
(markov-frequencies
(sequence
(comp
(map :text)
(mapcat words)
(map str/lower-case))
kjv)
major-phoneme-encode))
Using this dictionary, generate-mnemonic
produces phrases like this:
Digits | Sample |
---|---|
1414 | dare to her, dare to hear, dare to weary, dare to i here, dare to err |
1732 | dough ye come he any, dough ye come he no, dough ye come he know, dough ye come he now, dough ye come in |
2236 | hewn in me ye which, hewn in me shew, hewn in me ye shew, hewn in me which, hewn in me ye joy |
2646 | owe no issue he were i watch, owe no issue he were i wash, owe no issue he were i joy, owe no issue he were i shew, owe no issue he were i wish |
3183 | mad why have my, mad why have we him, mad why have me, mad why have we may, mad why have him |
31830 | mad why have him i whose, mad why have him i as, mad why have him i say, mad why have him i saw, mad why have a mess |
The results seem to be worse, in particular due to the prevalance of the non-encoding pronouns "you", "ye", and "I", which in some cases show up almost every other word. The variety created by shuffle
seems to be a (slightly) better approach.