Clojure

Nelson rules

I've used implementing Nelson rules as a technical interview challenge, because the rules themselves are quite easy to understand but how someone codes them shows a lot about their clarity of thought, from how they set up the problem to how they solve it. In this post I'm going to show one approach, relying heavily on Clojure's thread-last macro.

Rule 1: One point is more than 3 standard deviations from the mean.

For simplicity, I'm going to assume mean and standard deviation are passed to the function, rather than, say, the function calculating a rolling mean and standard deviation based on incoming data.

Thus, for the first rule, all we have to do is remove values within three standard deviations of the mean:

(defn nelson-1 [mean stdev xs]
  (let [low (- mean (* 3 stdev))
        high (+ mean (* 3 stdev))]
    (->> xs
      (map-indexed vector)
      (remove #(< low (second %) high)))))

I've used map-indexed to generate x-values. Let's test it out:

(nelson-1 0 1 [0 2 4 1 -1 -3 -2 0])

([2 4] [5 -3])

Rule 2: Nine (or more) points in a row are on the same side of the mean.

We'll use partition to create a sliding window and look for windows in which every value is greater than the mean or every value is less than the mean:

(defn nelson-2 [mean xs]
  (->> xs
    (map-indexed vector)
    (partition 9 1)
    (mapcat (fn [xys]
              (when (or (every? #(< mean (second %)) xys)
                        (every? #(> mean (second %)) xys))
                xys)))))

(nelson-2 0 [0 1 2 1 4 1 6 1 8 1 0 -1 1 -2 2
             -3 3 -1 -2 -3 -2 -1 -2 -3 -2 -1 0])

([1 1]
 [2 2]
 [3 1]
 [4 4]
 [5 1]
 [6 6]
 [7 1]
 [8 8]
 [9 1]
 [17 -1]
 [18 -2]
 [19 -3]
 [20 -2]
 [21 -1]
 [22 -2]
 [23 -3]
 [24 -2]
 [25 -1])

Rule 3: Six (or more) points in a row are continually increasing (or decreasing).

Similar to the implementation of Rule 2, but we have to throw in a distinct to avoid emitting values that have already been emitted:

(defn nelson-3 [xs]
  (->> xs
    (map-indexed vector)
    (partition 6 1)
    (mapcat (fn [xys]
              (when (or (apply < (map second xys))
                        (apply > (map second xys)))
                xys)))
    (distinct)))

(nelson-3 [-1 0 -1 1 2 3 4 5 4 3 2 1 -1 -3 -5 1 2 0])

([2 -1]
 [3 1]
 [4 2]
 [5 3]
 [6 4]
 [7 5]
 [8 4]
 [9 3]
 [10 2]
 [11 1]
 [12 -1]
 [13 -3]
 [14 -5])

Rule 4: Fourteen (or more) points in a row alternate in direction, increasing then decreasing.

We'll pair consecutive values and multiply every other comparison by -1. Then use partition-by to find runs of of the same comparison value:

(defn nelson-4 [xs]
  (->> xs
    (map-indexed vector)
    (partition 2 1)
    (map (fn [[[i a] [j b]]]
           [[[i a] [j b]]
            (* (if (even? i) 1 -1)
               (compare a b))]))
    (partition-by second)
    (filter #(<= 14 (count (distinct (mapcat first %)))))
    (mapcat #(mapcat first %))
    (distinct)))

(nelson-4 [0 1 2 3 4 3 5 2 6 1 7 0 8 -1 9 -2 10 11 13])

([3 3]
 [4 4]
 [5 3]
 [6 5]
 [7 2]
 [8 6]
 [9 1]
 [10 7]
 [11 0]
 [12 8]
 [13 -1]
 [14 9]
 [15 -2]
 [16 10])

Rule 5: Two (or three) points out of three points in a row are more than two standard deviations from the mean.

Create a rolling window of three values. For each window remove points within two standard deviations of the mean and keep those points where at least two fall on the same side of the mean:

(defn nelson-5 [mean stdev xs]
  (let [low (- mean (* 2 stdev))
        high (+ mean (* 2 stdev))]
    (->> xs
      (map-indexed vector)
      (partition 3 1)
      (mapcat (fn [xs]
                (->> xs
                  (remove #(<= low (second %) high))
                  (group-by #(compare mean (second %)))
                  (filter #(<= 2 (count (second %))))
                  (mapcat second))))
      (distinct))))

(nelson-5 0 1 [0 1 2 3 4 2 -1 -3 -3 -4 -5 -1 0 6])

([3 3] [4 4] [7 -3] [8 -3] [9 -4] [10 -5])

Rule 6: Four (or five) out of five points in a row are more than 1 standard deviation from the mean in the same direction.

Same idea, but create rolling windows of five values, and keep those points where at least four fall on the same side of the mean:

(defn nelson-6 [mean stdev xs]
  (let [low (- mean stdev)
        high (+ mean stdev)]
    (->> xs
      (map-indexed vector)
      (partition 5 1)
      (mapcat (fn [xs]
                (->> xs
                  (remove #(<= low (second %) high))
                  (group-by #(compare mean (second %)))
                  (filter #(<= 4 (count (second %))))
                  (mapcat second))))
      (distinct))))

(nelson-6 0 1 [0 2 3 2 4 1 -1 -2 -3 -2 -4 -5 -6 0 1 0 -3 -2 0])

([1 2] [2 3] [3 2] [4 4] [7 -2] [8 -3] [9 -2] [10 -4] [11 -5] [12 -6])

Rule 7: Fifteen points in a row are all within 1 standard deviation of the mean on either side of the mean.

This one is simpler than the last two, since we're just looking for a number of points in a row. For Rule #7, partition on whether points are within one standard deviation of the mean, then select segments whose points are within one standard deviation have fifteen or more values:

(defn nelson-7 [mean stdev xs]
  (let [low (- mean stdev)
        high (+ mean stdev)]
    (->> xs
      (map-indexed vector)
      (partition-by #(< low (second %) high))
      (filter #(and (< low (second (first %)) high)
                    (<= 15 (count %))))
      (mapcat seq))))

(nelson-7 0 2 [6 1 1 -1 1 -1 1 -1 1 -1 1 1 1 1 1 1
               5 1 -1 1 1 1 1 1 1 1 1 1 7])

([1 1]
 [2 1]
 [3 -1]
 [4 1]
 [5 -1]
 [6 1]
 [7 -1]
 [8 1]
 [9 -1]
 [10 1]
 [11 1]
 [12 1]
 [13 1]
 [14 1]
 [15 1])

Rule 8: Eight points in a row exist, but none within 1 standard deviation of the mean, and the points are in both directions from the mean.

Again, partition on whether points are within one standard deviation of the mean, but this time remove runs of points that are within that range. Select runs of at least eight values and that have at least one value above the mean and one value below it:

(defn nelson-8 [mean stdev xs]
  (let [low (- mean stdev)
        high (+ mean stdev)]
    (->> xs
      (map-indexed vector)
      (partition-by #(< low (second %) high))
      (remove #(< low (second (first %)) high))
      (filter #(<= 8 (count %)))
      (filter (fn [xys]
                (and (some #(= 1 (compare mean (second %))) xys)
                     (some #(= -1 (compare mean (second %))) xys))))
      (mapcat seq))))

(nelson-8 0 1 [0 2 3 4 5 4 3 2 3
               0 2 3 4 -5 4 3 2
               0 2 3 4 -5 4 3 2 3])

([18 2] [19 3] [20 4] [21 -5] [22 4] [23 3] [24 2] [25 3])

Conclusion

Instead of using the thread last macro, you can also use sequence with composed transducer functions, which avoids creating intermediate sequences. If you do, beware that clojure.core doesn't include a transducer equivalent of (partition n step coll). You'll either need to write a rolling window transducer, or use one from a library such as Christophe Grand's xforms.

June 2022