exupero's blog
RSSApps

EDN data DSL

I keep some loosely structured habit and travel information in a set of EDN files. For a while I used an actual database, but I quickly discovered that I spent a lot of time tinkering with the UI for inputting data. I also tried spreadsheets, but the data is sparse and was awkward in a tabular format. Also, spreadsheets aren't very easy to query. The scheme I've settled on is to have a folder of EDN files that contain Clojure code, which when evaluated produce a list of maps.

To parse an individual file, I wrap its contents in square brackets and read it as a vector of EDN forms:

(defn parse-file [s]
  (clojure.edn/read-string {} (format "[%s]" s)))

Before evaluating a file, I bind the date in the filename to a dynamic var, then I evaluate the EDN forms as if they were Clojure code:

(def ^:dynamic *date*)
(defn eval-file [nmsp file]
  (binding [*date* (re-find #"\d{4}-\d{2}-\d{2}" (.getName file))]
    (->> (slurp file)
         parse-file
         (mapv (fn [form]
                 (binding [*ns* nmsp]
                   (eval form))))
         (remove var?)
         (mapv #(cond-> %
                  (not (:date %)) (assoc :date *date*))))))

In reality, EDN is only a subset of Clojure syntax. I can't use Clojure's reader macros, such as @ for dereferencing atoms, nor can I create anonymous functions with #(...). But non-reader macros such as def and defn do work, so after evaluating forms I discard vars. Finally, for any maps without a :date field, I add the date given by the filename.

A lot of power hides in that eval. Though I'm generating a list of maps, I typically don't write out the maps explicitly; instead, having Clojure available, I define functions that return maps, then call those functions. For example, instead of writing

{:lat 43 :lon -85}

I'll write

(defn location [lat lon]
  {:lat lat :lon lon})
(location 43 -85)

Defining and invoking functions like this keeps the maps for specific pieces of information structured consistently, and allows me to easily change details of a map, such as changing the names of keys (e.g., changing :lat to :latitude). It also avoids typos in the names of keys, which evaling doesn't detect, but eval will throw an error if I mistype the name of a function.

Using this style, I've built up a DSL of common entries. A typical file looks something like this:

(location 43 -85)
(outdoors 1030 1130)
(-> (walk)
    (route "park.geojson")
    (note "warm day, unusually busy"))
(no-tv)

Functions like route and note are modifier functions that add optional data to maps, and having most of the power of Clojure makes it easy to chain modifiers with ->.

Another subtlety in eval-file is that forms can refer to *date*. I don't reference it directly, but I do have tag-reader macros called #yesterday and #tomorrow that read *date* and rebind it to the date before or after, in case I want to refer to events that extend beyond the date named by the file. To use them, I supply :readers in parse-string:

(defn parse-string [s]
  (clojure.edn/read-string
    {:readers {'yesterday yesterday
               'tomorrow tomorrow}}
    (format "[%s]" s)))

Here's the main entry point, which reads the files in a directory:

(defn read-data [directory]
  (let [nmsp *ns*]
    (->> (file-seq (clojure.java.io/file directory))
         (filter #(re-find #"\d{4}-\d{2}-\d{2}" (.getName %)))
         (sort-by #(.getName %))
         (mapcat #(eval-file nmsp %)))))

Note that the files are evaluated in the current namespace, which means a function defined within one file is available in all subsequent files. Alternatively, you can group function definitions in a library file and evaluate it before processing the data files.

Querying this DB is as easy as getting a list of maps from read-data and filtering them with Clojure's built-in collection functions. The above code can all be run in Babashka, so I have a suite of bb tasks that print reports on various aspects of the data (such as how much time I spend outdoors). Those plain-text reports can then be piped to commands that generate data visualizations.

Overall, this setup allows me to have a low-maintenance log that produces well-structured, queryable data, yet is also sufficiently human-readable.