exupero's blog

EDN data DSL

I keep some loosely structured habit and travel information in a set of EDN files. For a while I used an actual database, but I quickly discovered that I spent a lot of time tinkering with the UI for inputting data. I also tried spreadsheets, but the data is sparse and was awkward in a tabular format. Also, spreadsheets aren't very easy to query. The scheme I've settled on is to have a folder of EDN files that contain Clojure code, which when evaluated produce a list of maps.

To parse an individual file, I wrap its contents in square brackets and read it as a vector of EDN forms:

(defn parse-file [s]
  (clojure.edn/read-string {} (format "[%s]" s)))

Before evaluating a file, I bind the date in the filename to a dynamic var, then I evaluate the EDN forms as if they were Clojure code:

(def ^:dynamic *date*)
(defn eval-file [nmsp file]
  (binding [*date* (re-find #"\d{4}-\d{2}-\d{2}" (.getName file))]
    (->> (slurp file)
         (mapv (fn [form]
                 (binding [*ns* nmsp]
                   (eval form))))
         (remove var?)
         (mapv #(cond-> %
                  (not (:date %)) (assoc :date *date*))))))

In reality, EDN is only a subset of Clojure syntax. I can't use Clojure's reader macros, such as @ for dereferencing atoms, nor can I create anonymous functions with #(...). But non-reader macros such as def and defn do work, so after evaluating forms I discard vars. Finally, for any maps without a :date field, I add the date given by the filename.

A lot of power hides in that eval. Though I'm generating a list of maps, I typically don't write out the maps explicitly; instead, having Clojure available, I define functions that return maps, then call those functions. For example, instead of writing

{:lat 43 :lon -85}

I'll write

(defn location [lat lon]
  {:lat lat :lon lon})
(location 43 -85)

Defining and invoking functions like this keeps the maps for specific pieces of information structured consistently, and allows me to easily change details of a map, such as changing the names of keys (e.g., changing :lat to :latitude). It also avoids typos in the names of keys, which evaling doesn't detect, but eval will throw an error if I mistype the name of a function.

Using this style, I've built up a DSL of common entries. A typical file looks something like this:

(location 43 -85)
(outdoors 1030 1130)
(-> (walk)
    (route "park.geojson")
    (note "warm day, unusually busy"))

Functions like route and note are modifier functions that add optional data to maps, and having most of the power of Clojure makes it easy to chain modifiers with ->.

Another subtlety in eval-file is that forms can refer to *date*. I don't reference it directly, but I do have tag-reader macros called #yesterday and #tomorrow that read *date* and rebind it to the date before or after, in case I want to refer to events that extend beyond the date named by the file. To use them, I supply :readers in parse-string:

(defn parse-string [s]
    {:readers {'yesterday yesterday
               'tomorrow tomorrow}}
    (format "[%s]" s)))

Here's the main entry point, which reads the files in a directory:

(defn read-data [directory]
  (let [nmsp *ns*]
    (->> (file-seq (clojure.java.io/file directory))
         (filter #(re-find #"\d{4}-\d{2}-\d{2}" (.getName %)))
         (sort-by #(.getName %))
         (mapcat #(eval-file nmsp %)))))

Note that the files are evaluated in the current namespace, which means a function defined within one file is available in all subsequent files. Alternatively, you can group function definitions in a library file and evaluate it before processing the data files.

Querying this DB is as easy as getting a list of maps from read-data and filtering them with Clojure's built-in collection functions. The above code can all be run in Babashka, so I have a suite of bb tasks that print reports on various aspects of the data (such as how much time I spend outdoors). Those plain-text reports can then be piped to commands that generate data visualizations.

Overall, this setup allows me to have a low-maintenance log that produces well-structured, queryable data, yet is also sufficiently human-readable.