Skip to main content
3 of 5
added 2924 characters in body
Carcigenicate
  • 16.6k
  • 3
  • 37
  • 82

First, the most serious problem with your code:

Never, ever, ever use def inside of defn, unless you really know what you're doing. def creates global variables that last even once the function has returned:

(defn func []
  (def x 1))

(func)
(println x) ; Prints 1! Ouch!

The only time it's really ever appropriate to use def locally is when you intend to create globals. This is an extremely rare case though.

Instead, you should be using let. For example:

(defn cos-denominator [map-1 map-2]
  (let [p-1 (sum (squares (vals map-1)))
        p-2 (sum (squares (vals map-2)))]
    (* (Math/sqrt p-1) (Math/sqrt p-2))))

let creates a limited scope for the vals that it defines. Once you leave the let, they go out of scope.

That's the only really outright wrong thing I see in your code. The rest of my comments will be best practice related.


use, require and related functions shouldn't really be used directly. It's much cleaner to have them as part of the ns macro:

(ns clojure-first-try.core
  (:use [clojure.string :only [split]]
        [clojure.set :only [intersection]]))

Your use of $ as an identifier is bizarre. I had to double take there. I would recommend using something a little more descriptive. If you can't think of a good identifier to use, I think even a single letter would be better than a odd symbol like $.


(reduce + 0.0 arr))

is more typically written as

(apply + arr)

Your ways not wrong, the latter is just generally considered more idiomatic. This works because + has a var-arg overload that is basically an explicit reduction.

Also, you don't need to specify 0.0 as the starting accumulator of the reduction. + handles that for you.

(reduce + []) ; 0

(filter (complement empty?) arr)

can be written as

(remove empty? arr)

remove adds complement for you! Here's the core definition (basically):

(defn remove [pred coll] ; Abridged defintion
   (filter (complement pred) coll))

It reads much nicer without the explicit call to complement.


(= b 0.0)

can also be written as just:

(zero? b)

Although the gains aren't huge. I find its better when in a already busy line, like when checking for factors of a number. I find:

(zero? (rem a b))

to be neater than

(= 0 (rem a b))


For cos-numerator:

(intersection (set (keys map-1)) (set (keys map-2)))

is really inefficient. You iterate both maps just to put them in a set to test for intersections. Without doing more timing, it's unclear what the complexity of keys is. If it's O(n), this iterates each map twice, then again as a set. It's much more concise, and significantly faster to just use filter:

(filter map-1 (keys map-2))

Maps return nil on a bad lookup, so they can be used as predicates to check for membership.

I also think this could be cleaned up with ->. I changed it to:

(defn cos-numerator [map-1 map-2]
  (->> (filter map-1 (keys map-2))
       (map #(* (map-1 %) (map-2 %)))
       (sum)))

squares could also be written as

(map * arr arr))

You're going to have a little bit of duplication somewhere. It just depends what you want it to look like. I find this a little neater.


I'm a big fan of local anonymous functions to clean up code. cos-denominator has you doing the exact same thing to both maps, then taking their product.

I'd make the common transformation a local function, then use that instead:

(defn cos-denominator [map-1 map-2]
  (let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
    (* (proc map-1) (proc map-2))))

That has much less duplication. You could even go a little overboard and get rid of the duplicate calls to proc:

(defn cos-denominator2 [map-1 map-2]
  (let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
    (apply * (map proc [map-1 map-2]))))

Which could also be made a little neater with ->>:

(defn cos-denominator3 [map-1 map-2]
  (let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
    (->> [map-1 map-2]
         (map proc)
         (apply *))))


After the above changes, and a few other touchups that I thought made it neater, this is what I ended up with:

(ns clojure-first-try.core
  (:use [clojure.string :only [split]]
        [clojure.set :only [intersection]]
        [clojure.test :refer :all])
  (:require [criterium.core :as c]))

(defn sum [arr]
  (apply + arr))

(defn squares [arr]
  (map * arr arr))

(defn words-frequency [text]
  (as-> text t
        (split t #"\s+")
        (filter (complement empty?) t)
        (frequencies t)))

(defn cos-numerator [map-1 map-2]
  (->> (filter map-1 (keys map-2))
       (map #(* (map-1 %) (map-2 %)))
       (sum)))

(defn cos-denominator [map-1 map-2]
  (let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
    (->> [map-1 map-2]
         (map proc)
         (apply *))))

(defn cos-similarity [freq-1 freq-2]
  (let [a (cos-numerator freq-1 freq-2)
        b (cos-denominator freq-1 freq-2)]
    (if (zero? b)
      0.0
      (/ a b))))

(defn sort-by-similarity [base options]
  (let [base-freq (words-frequency base)
        sorter (comp #(cos-similarity base-freq %) words-frequency)]
    (sort-by sorter > options)))
Carcigenicate
  • 16.6k
  • 3
  • 37
  • 82