Skip to main content
added 2 characters in body
Source Link
Carcigenicate
  • 16.6k
  • 3
  • 37
  • 82

Although the gains aren't huge. I find itsit's better when in aan already busy line, like when checking for factors of a number. I find:

Although the gains aren't huge. I find its better when in a already busy line, like when checking for factors of a number. I find:

Although the gains aren't huge. I find it's better when in an already busy line, like when checking for factors of a number. I find:

added 1 character in body
Source Link
Carcigenicate
  • 16.6k
  • 3
  • 37
  • 82

I also think this could be cleaned up with ->>>. I changed it to:

I also think this could be cleaned up with ->. I changed it to:

I also think this could be cleaned up with ->>. I changed it to:

added 2924 characters in body
Source Link
Carcigenicate
  • 16.6k
  • 3
  • 37
  • 82
 

Those wereFor cos-numerator:

(intersection (set (keys map-1)) (set (keys map-2)))

is really inefficient. You iterate both maps just to put them in a set to test for intersections. Without doing more timing, it's unclear what the main thingscomplexity of keys is. I'll come back later with new eyesIf it's O(n), this iterates each map twice, then again as a set. It's much more concise, and see if Isignificantly faster to just use filter:

(filter map-1 (keys map-2))

Maps return nil on a bad lookup, so they can see anything elsebe used as predicates to check for membership.

Besides your abuse ofI also think this could be cleaned up with def-> here,. I changed it to:

(defn cos-numerator [map-1 map-2]
  (->> (filter map-1 (keys map-2))
       (map #(* (map-1 %) (map-2 %)))
       (sum)))

squares could also be written as

(map * arr arr))

You're going to have a little bit of duplication somewhere. It just depends what you want it to look like. I find this isn't terriblea little neater.


I'm a big fan of local anonymous functions to clean up code. You're using macros when appropriatecos-denominator has you doing the exact same thing to both maps, andthen taking their product.

I'd make goodthe common transformation a local function, then use that instead:

(defn cos-denominator [map-1 map-2]
  (let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
    (* (proc map-1) (proc map-2))))

That has much less duplication. You could even go a little overboard and get rid of the fundamental sequence processing functions likeduplicate calls to mapproc and:

(defn cos-denominator2 [map-1 map-2]
  (let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
    (apply * (map proc [map-1 map-2]))))

Which could also be made a little neater with filter->>.:

(defn cos-denominator3 [map-1 map-2]
  (let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
    (->> [map-1 map-2]
         (map proc)
         (apply *))))


After the above changes, and a few other touchups that I thought made it neater, this is what I ended up with:

(ns clojure-first-try.core
  (:use [clojure.string :only [split]]
        [clojure.set :only [intersection]]
        [clojure.test :refer :all])
  (:require [criterium.core :as c]))

(defn sum [arr]
  (apply + arr))

(defn squares [arr]
  (map * arr arr))

(defn words-frequency [text]
  (as-> text t
        (split t #"\s+")
        (filter (complement empty?) t)
        (frequencies t)))

(defn cos-numerator [map-1 map-2]
  (->> (filter map-1 (keys map-2))
       (map #(* (map-1 %) (map-2 %)))
       (sum)))

(defn cos-denominator [map-1 map-2]
  (let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
    (->> [map-1 map-2]
         (map proc)
         (apply *))))

(defn cos-similarity [freq-1 freq-2]
  (let [a (cos-numerator freq-1 freq-2)
        b (cos-denominator freq-1 freq-2)]
    (if (zero? b)
      0.0
      (/ a b))))

(defn sort-by-similarity [base options]
  (let [base-freq (words-frequency base)
        sorter (comp #(cos-similarity base-freq %) words-frequency)]
    (sort-by sorter > options)))

Those were the main things. I'll come back later with new eyes and see if I can see anything else.

Besides your abuse of def here, this isn't terrible code. You're using macros when appropriate, and make good use of the fundamental sequence processing functions like map and filter.

 

For cos-numerator:

(intersection (set (keys map-1)) (set (keys map-2)))

is really inefficient. You iterate both maps just to put them in a set to test for intersections. Without doing more timing, it's unclear what the complexity of keys is. If it's O(n), this iterates each map twice, then again as a set. It's much more concise, and significantly faster to just use filter:

(filter map-1 (keys map-2))

Maps return nil on a bad lookup, so they can be used as predicates to check for membership.

I also think this could be cleaned up with ->. I changed it to:

(defn cos-numerator [map-1 map-2]
  (->> (filter map-1 (keys map-2))
       (map #(* (map-1 %) (map-2 %)))
       (sum)))

squares could also be written as

(map * arr arr))

You're going to have a little bit of duplication somewhere. It just depends what you want it to look like. I find this a little neater.


I'm a big fan of local anonymous functions to clean up code. cos-denominator has you doing the exact same thing to both maps, then taking their product.

I'd make the common transformation a local function, then use that instead:

(defn cos-denominator [map-1 map-2]
  (let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
    (* (proc map-1) (proc map-2))))

That has much less duplication. You could even go a little overboard and get rid of the duplicate calls to proc:

(defn cos-denominator2 [map-1 map-2]
  (let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
    (apply * (map proc [map-1 map-2]))))

Which could also be made a little neater with ->>:

(defn cos-denominator3 [map-1 map-2]
  (let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
    (->> [map-1 map-2]
         (map proc)
         (apply *))))


After the above changes, and a few other touchups that I thought made it neater, this is what I ended up with:

(ns clojure-first-try.core
  (:use [clojure.string :only [split]]
        [clojure.set :only [intersection]]
        [clojure.test :refer :all])
  (:require [criterium.core :as c]))

(defn sum [arr]
  (apply + arr))

(defn squares [arr]
  (map * arr arr))

(defn words-frequency [text]
  (as-> text t
        (split t #"\s+")
        (filter (complement empty?) t)
        (frequencies t)))

(defn cos-numerator [map-1 map-2]
  (->> (filter map-1 (keys map-2))
       (map #(* (map-1 %) (map-2 %)))
       (sum)))

(defn cos-denominator [map-1 map-2]
  (let [proc #(-> % (vals) (squares) (sum) (Math/sqrt))]
    (->> [map-1 map-2]
         (map proc)
         (apply *))))

(defn cos-similarity [freq-1 freq-2]
  (let [a (cos-numerator freq-1 freq-2)
        b (cos-denominator freq-1 freq-2)]
    (if (zero? b)
      0.0
      (/ a b))))

(defn sort-by-similarity [base options]
  (let [base-freq (words-frequency base)
        sorter (comp #(cos-similarity base-freq %) words-frequency)]
    (sort-by sorter > options)))
added 178 characters in body
Source Link
Carcigenicate
  • 16.6k
  • 3
  • 37
  • 82
Loading
Source Link
Carcigenicate
  • 16.6k
  • 3
  • 37
  • 82
Loading