doxa
a simple in-memory database, trying to copy the best solutions from datascript,
crux, fulcro, autonormal, kechema, shadow-grow etc. each of them has its own
strengths, and is an excellent solution in its own narrow scope, but each lacks
some of the cool things that the others have. as usual in my case, everything
but the copying worst things came out poorly
this is an attempt to create a normalized db based on a simple hashmap, that
allows you to edit data both using clojure.core functions and using
transactions, where pull is incredibly fast, and that allows you to write
queries using datalog. surprisingly i was able to get most things working and
usable.
but basically why? primarily out of curiosity and boredom, and because datascript completely failed me when writing spas using re-frame & re-posh. while datascript has excellent search capabilities using datalog, it is, contrary to opinion, a deadly slow, unreliable solution. the db itself is also very cumbersome when trying to use tools like re-frame-10x, it also dramatically complicates sync data with firebase firestore and realtime db.
so what advantage does doxa have? by using a regular hasmap with structure
{?table {?eid {?k ?v ...} ...} ...} allows you to seamlessly synchronize data
with any document db. it also makes it easy to inspect data, save to disk,
upload, patch and much more. this design also allows for lightning-fast searches
using pull-syntax/eql, which is an order of magnitude faster than in datascript,
despite such a simple structure, it managed to create an incomplete and buggy
way to write queries using datalog, which are translated into a meander search,
thanks to the macro. even more surprising, the meander is so amazing
that the speed for simple queries is barely slower, while for more complex
queries performance matches datascript.
(require '[ribelo.doxa :as dx])
db structure
db is a simple map with three levels of nesting
{?table {?eid {:db/id ?eid
?k ?v}}
... {... {:db/id ...}}}
why not flat map? because this design allows for greater flexibility when it
comes to the operation of references and naming entity. [person/id :ivan] looks
better than :person.id/ivan, is easier to create and easier to use.
but wouldn’t operations on a flat map be faster? in normal use, the speed
difference is negligible, while a nested map can be faster than datascript due
to the possibility of limiting the iterated elements, see benchmark
usage examples
transactions
(def data [{:db/id 1 :name "Petr" :aka ["Devil"]}])
(def db (dx/create-dx data)
;; => #:db{:id {1 {:db/id 1, :name "Petr", :aka ["Devil"]}}}
)
put
add entity
(dx/commit {} [[:dx/put {:db/id 1 :name "David" :aka ["Devil"]}]])
;; => #:db{:id {1 {:db/id 1, :name "David", :aka ["Devil"]}}}
(dx/commit {} [[:dx/put [:db/id 1] {:name "David" :aka ["Devil"]}]])
;; => #:db{:id {1 {:name "David", :aka ["Devil"], :db/id 1}}}
single keyword change
(dx/commit db [[:dx/put [:db/id 1] :name "David"]])
;; => #:db{:id {1 {:db/id 1, :name "David", :aka ["Devil"]}}}
(dx/commit db [[:dx/put [:db/id 1] :aka ["Tupen"]]])
;; => #:db{:id {1 {:db/id 1, :name "Petr", :aka ["Tupen"]}}}
add data with autonormalization
(dx/commit db [[:dx/put [:db/id 1] :friend [{:db/id 2 :name "Ivan"} {:db/id 3 :name "Lucy"}]]])
;; =>
;; #:db{:id {1 {:db/id 1, :name "Petr", :aka ["Devil"], :friend [[:db/id 2] [:db/id 3]]},
;; 2 {:db/id 2, :name "Ivan"},
;; 3 {:db/id 3, :name "Lucy"}}}
delete
delete entity
(dx/commit db [[:dx/delete [:db/id 1]]])
;; => {}
delete keyword
(dx/commit db [[:dx/delete [:db/id 1] :aka]])
;; => #:db{:id {1 {:db/id 1, :name "Petr"}}}
remove elem from vector
(dx/commit db [[:dx/delete [:db/id 1] :aka "Devil"]])
;; => #:db{:id {1 {:db/id 1, :name "Petr"}}}
remove an invalid key
(dx/commit db [[:dx/delete [:db/id 1] :AKA "Devil"]])
;; => #:db{:id {1 {:db/id 1, :name "Petr", :aka ["Devil"]}}}
conj
because the database is schemeless, if we want to add something to the vector we
have to use conj
add elem
(dx/commit db [[:dx/conj [:db/id 1] :aka "Tupen"]])
;; => #:db{:id {1 {:db/id 1, :name "Petr", :aka ["Devil" "Tupen"]}}}
(dx/commit db [[:dx/conj [:db/id 1] :name "Ivan"]])
;; => #:db{:id {1 {:db/id 1, :name ["Petr" "Ivan"], :aka ["Devil"]}}}
with autonormalisation
(dx/commit db [[:dx/conj [:db/id 1] :friend {:db/id 2 :name "Ivan"}]])
;; =>
;; #:db{:id {1 {:db/id 1, :name "Petr", :aka ["Devil"], :friend [[:db/id 2]]},
;; 2 {:db/id 2, :name "Ivan"}}}
(dx/commit db [[:dx/conj [:db/id 1] :friend [{:db/id 2 :name "Ivan"} {:db/id 3 :name "Lucy"}]]])
;; =>
;; #:db{:id {1 {:db/id 1, :name "Petr", :aka ["Devil"], :friend [[:db/id 2] [:db/id 3]]},
;; 2 {:db/id 2, :name "Ivan"}, 3
;; {:db/id 3, :name "Lucy"}}}
update
(dx/commit db [[:dx/update [:db/id 1] assoc :aka "Tupen"]])
;; => #:db{:id {1 {:db/id 1, :name "Petr", :aka "Tupen"}}}
(dx/commit db [[:dx/update [:db/id 1] :aka conj "Tupen"]])
;; => #:db{:id {1 {:db/id 1, :name "Petr", :aka ["Devil" "Tupen"]}}}
match
just like in crux, we can use match
if data match, db is returned unchanged, otherwise nil
match entity
(dx/commit db [[:dx/match [:db/id 1] {:db/id 1 :name "Petr", :aka ["Devil"]}]])
;; => #:db{:id {1 {:db/id 1, :name "Petr", :aka ["Devil"]}}}
match keyword
(dx/commit db [[:dx/match [:db/id 1] :aka ["Devil"]]])
;; => #:db{:id {1 {:db/id 1, :name "Petr", :aka ["Devil"]}}}
conditional put
(dx/commit db [[:dx/match [:db/id 1] :aka ["Devil"]]
[:dx/put [:db/id 1] :aka ["Tupen"]]])
;; => #:db{:id {1 {:db/id 1, :name "Petr", :aka ["Tupen"]}}}
conditional delete
(dx/commit db [[:dx/match [:db/id 1] :aka ["Tupen"]]
[:dx/delete [:db/id 1] :aka]])
;; => #:db{:id {1 {:db/id 1, :name "Petr", :aka ["Devil"]}}}
transactions are dropped until the next valid match occurs
(dx/commit db [[:dx/match [:db/id 1] :aka ["Tupen"]]
[:dx/put [:db/id 1] :age 15]
[:dx/match [:db/id 1] :name "Petr"]
[:dx/put [:db/id 1] :sex :male]])
;; => #:db{:id {1 {:db/id 1, :name "Petr", :aka ["Devil"], :sex :male}}}pull
(def people-docs
[{:db/id 1, :name "Petr", :aka ["Devil" "Tupen"] :child [[:db/id 2] [:db/id 3]]}
{:db/id 2, :name "David", :father [[:db/id 1]]}
{:db/id 3, :name "Thomas", :father [[:db/id 1]]}
{:db/id 4, :name "Lucy" :friend [[:db/id 5]], :enemy [[:db/id 6]]}
{:db/id 5, :name "Elizabeth" :friend [[:db/id 6]], :enemy [[:db/id 7]]}
{:db/id 6, :name "Matthew", :father [[:db/id 3]], :friend [[:db/id 7]], :enemy [[:db/id 8]]}
{:db/id 7, :name "Eunan", :friend [[:db/id 8]], :enemy [[:db/id 4]]}
{:db/id 8, :name "Kerri"}
{:db/id 9, :name "Rebecca"}])
(def db (dx/create-dx people-docs))
eql
(dx/pull db {[:db/id 1] [:name :aka]})
;; => {:name "Petr", :aka ["Devil"]}
datomic like pull syntax
(dx/pull db [:name :aka] [:db/id 1])
;; => {:name "Petr", :aka ["Devil"]}
simple query
(dx/pull db [:name :father :db/id] [:db/id 6])
;; => {:name "Matthew", :father [:db/id 3], :db/id 6}
pull many
(dx/pull db [:name] [[:db/id 1] [:db/id 5] [:db/id 7] [:db/id 9]])
;; => [{:name "Petr"} {:name "Elizabeth"} {:name "Eunan"} {:name "Rebecca"}]
reverse search
(dx/pull db [:name :_child] [:db/id 2])
;; => {:name "David", :_child [:db/id 1]}
(dx/pull db [:name {:_child [:name]}] [:db/id 2])
;; => {:name "David", :_child {:name "Petr"}}
reverse non-component references yield collections
(dx/pull db '[:name :_father] [:db/id 3])
;; => {:name "Thomas", :_father [:db/id 6]}
(dx/pull db '[:name :_father] [:db/id 1])
;; => {:name "Petr", :_father [[:db/id 3] [:db/id 2]]}
(dx/pull db '[:name {:_father [:name]}] [:db/id 3])
;; => {:name "Thomas", :_father {:name "Matthew"}}
(dx/pull db '[:name {:_father [:name]}] [:db/id 1])
;; => {:name "Petr", :_father [{:name "Thomas"} {:name "David"}]}
wildcard
(dx/pull db [:*] [:db/id 1])
;; =>
;; {:db/id 1, :name "Petr", :aka ["Devil" "Tupen"], :child [[:db/id 2] [:db/id 3]]}
(dx/pull db [:* :_child] [:db/id 2])
;; => {:db/id 2, :name "David", :father [:db/id 1], :_child [:db/id 1]}
missing attrs are dropped
(dx/pull db [:name {:child [:name]}] [:db/id 2])
;; => {:name "David"}
non matching results are removed from collections
(dx/pull db [:name {:child [:foo]}] [:db/id 1])
;; => {:name "Petr", :child []}
datalog
(def db (dx/create-dx [{:db/id 1, :name "Ivan" :age 15}
{:db/id 2, :name "Petr" :age 37}
{:db/id 3, :name "Ivan" :age 37}
{:db/id 4, :age 15}]))
joins
unlike everything else, doxa does not return a set, but a vector, which has far-reaching consequences
(dx/q [:find ?e
:where [?e :name]]
db)
;; => [[1] [2] [3]]
(dx/q [:find ?e ?v
:where
[?e :name "Ivan"]
[?e :age ?v]]
db)
;; => [[1 15] [3 37]]
each element is checked once, so the result in a normal engine would be [[1 1] [1 3] [3 3]]
(dx/q [:find ?e1 ?e2
:where
[?e1 :name ?n]
[?e2 :name ?n]] db)
;; => [[1 3] [3 1]]
(dx/q [:find ?e1 ?e2 ?n
:where
[?e1 :name "Ivan"]
[?e1 :age ?a]
[?e2 :age ?a]
[?e2 :name ?n]]
db)
;; => [[3 2 "Petr"]]
many
meander is running underneath, so you can use all the functions available in the meander, e.g. scan
(def db (dx/create-dx [{:db/id 1
:name "Ivan"
:aka ["ivolga" "pi"]}
{:db/id 2
:name "Petr"
:aka ["porosenok" "pi"]}]))
(dx/q [:find ?n1 ?n2
:where
[?e1 :aka (m/scan ?x)]
[?e2 :aka (m/scan ?x)]
[?e1 :name ?n1]
[?e2 :name ?n2]]
db)
;; => [["Ivan" "Petr"] ["Petr" "Ivan"]]
in
(def db (dx/create-dx [{:db/id 1, :name "Ivan" :age 15 :email "ivan@mail.ru"}
{:db/id 2, :name "Petr" :age 37 :email "petr@gmail.com"}
{:db/id 3, :name "Ivan" :age 37 :email "ivan@mail.ru"}]))
(dx/q [:find ?e
:in ?attr ?value
:where [?e ?attr ?value]]
db :name "Ivan")
;; => [[1] [3]]
(dx/q [:find ?e
:in ?attr [?value]
:where [?e ?attr ?value]]
db :name ["Ivan" "Petr"])
;; => [[1] [2] [3]]
(dx/q [:find ?e
:in ?attr ?value
:where [?e ?attr ?value]]
db :age 37)
;; => [[2] [3]]
relation binding
(dx/q [:find ?e ?email
:in [[?n ?email]]
:where
[?e :name ?n]
[?e :email ?email]]
db
[["Ivan" "ivan@mail.ru"]
["Petr" "petr@gmail.com"]])
;; => [[1 "ivan@mail.ru"] [2 "petr@gmail.com"] [3 "ivan@mail.ru"]]
joins with idents
unfortunately, but using links in the form of [?table ?id] also entails disadvantages and difficulties.
(def db (dx/create-dx [{:db/id 1
:name "Ivan"
:friend [{:db/id 2
:name "Petr"}
{:db/id 3
:name "Oleg"}]}]))
db
;; {:db/id {2 {:db/id 2, :name "Petr"}
;; 3 {:db/id 3, :name "Oleg"}
;; 1 {:db/id 1,
;; :name "Ivan",
;; :friend [[:db/id 2] [:db/id 3]]}}}
references are always vector and must be treated as such
(dx/q [:find [?friends ...]
:where
[?e :name "Ivan"]
[?e :friend ?friends]]
db)
;; => [[[:db/id 2] [:db/id 3]]]
if we try to do a simple join we get nothing :(
(dx/q [:find ?fname .
:where
[?e :name "Ivan"]
[?e :friend ?friends]
[?friends :name ?fname]]
db)
;; => []
but knowing what a reference looks like, we can get around this
(dx/q [:find [?fname ...]
:where
[?e :name "Ivan"]
[?e :friend [_ ?friend]]
[?friend :name ?fname]]
db)
;; => ["Petr" "Oleg"]
at the moment my knowledge of meader internals is too limited to make it nicer
benchmark
(require '[taoensso.encore :as enc])
(require '[datascript.core :as d])
(require '[ribelo.doxa :as dx])
It is rare for a spa database to contain things that cannot be divided into tables or assigned categories. so let’s create 100k maps for 10 different categories
(let [next-eid (volatile! 0)]
(defn random-man []
{:db/id (vswap! next-eid inc)
:name (rand-nth ["Ivan" "Petr" "Sergei" "Oleg" "Yuri" "Dmitry" "Fedor" "Denis"])
:last-name (rand-nth ["Ivanov" "Petrov" "Sidorov" "Kovalev" "Kuznetsov" "Voronoi"])
:alias (vec
(repeatedly (rand-int 10) #(rand-nth ["A. C. Q. W." "A. J. Finn" "A.A. Fair" "Aapeli" "Aaron Wolfe" "Abigail Van Buren" "Jeanne Phillips" "Abram Tertz" "Abu Nuwas" "Acton Bell" "Adunis"])))
:age (rand-int 100)
:salary (rand-int 100000)
:friend {:db/ref-id (rand-int 20000)}})
(defn random-fruit []
{:fruit/id (vswap! next-eid inc)
:name (rand-nth ["Avocado" "Grape" "Plum" "Apple" "Orange"])
:price (rand-int 100)})
(defn random-vegetable []
{:vegetable/id (vswap! next-eid inc)
:name (rand-nth ["Onion" "Cabbage" "Pea" "Tomatto" "Lettuce"])
:price (rand-int 100)})
(defn random-car []
{:car/id (vswap! next-eid inc)
:name (rand-nth ["Audi" "Mercedes" "BMW" "Ford" "Honda" "Toyota"])
:price (rand-int 100)})
(defn random-animal []
{:animal/id (vswap! next-eid inc)
:name (rand-nth ["Otter" "Dog" "Panda" "Lynx" "Cat" "Lion"])
:price (rand-int 100)})
(defn random-cat []
{:cat/id (vswap! next-eid inc)
:name (rand-nth ["Traditional Persian" "Ocicat" "Munchkin cat" "Persian cat" "Burmese cat"])
:price (rand-int 100)})
(defn random-dog []
{:dog/id (vswap! next-eid inc)
:name (rand-nth ["Croatian Shepherd" "Deutch Langhaar" "Miniature Pincher" "Italian Sighthound" "Jack Russell Terrier"])
:price (rand-int 100)})
(defn random-country []
{:country/id (vswap! next-eid inc)
:name (rand-nth ["Seychelles" "Greenland" "Iceland" "Bahrain" "Bhutan"])
:price (rand-int 100)})
(defn random-language []
{:language/id (vswap! next-eid inc)
:name (rand-nth ["Malagasy" "Kashmiri" "Amharic" "Inuktitut" "Esperanto"])
:price (rand-int 100)})
(defn random-marijuana-strain []
{:marijuana/id (vswap! next-eid inc)
:name (rand-nth ["Lemonder" "Black-Mamba" "Blueberry-Space-Cake" "Strawberry-Amnesia"])
:price (rand-int 100)})
(defn random-planet []
{:planet/id (vswap! next-eid inc)
:name (rand-nth ["Pluto" "Saturn" "Venus" "Mars" "Jupyter"])
:price (rand-int 100)}))
(def people (repeatedly random-man))
(def fruit (repeatedly random-fruit))
(def vegetable (repeatedly random-vegetable))
(def car (repeatedly random-car))
(def animal (repeatedly random-animal))
(def cat (repeatedly random-cat))
(def dog (repeatedly random-dog))
(def country (repeatedly random-country))
(def language (repeatedly random-language))
(def marijuana-strain (repeatedly random-marijuana-strain))
(def planet (repeatedly random-planet))
(def people50k (shuffle (take 50000 people)))
(def fruit10k (shuffle (take 10000 fruit)))
(def vegetable10k (shuffle (take 10000 vegetable)))
(def car10k (shuffle (take 10000 car)))
(def animal10k (shuffle (take 10000 animal)))
(def cat10k (shuffle (take 10000 cat)))
(def dog10k (shuffle (take 10000 dog)))
(def country10k (shuffle (take 10000 country)))
(def language10k (shuffle (take 10000 language)))
(def marijuana-strain10k (shuffle (take 10000 marijuana-strain)))
(def planet10k (shuffle (take 10000 planet)))
(def data100k (enc/into-all []
fruit10k vegetable10k car10k animal10k cat10k dog10k
country10k language10k marijuana-strain10k planet10k))
(def schema
{:friend {:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many}
:alias {:db/cardinality :db.cardinality/many}})transaction
adding data one transaction at a time
(defn datascript-add-1 [data]
(enc/qb 1
(reduce
(fn [db p]
(-> db
(d/db-with [[:db/add (:db/id p) :name (:name p)]])
(d/db-with [[:db/add (:db/id p) :last-name (:last-name p)]])
(d/db-with [[:db/add (:db/id p) :age (:age p)]])
(d/db-with [[:db/add (:db/id p) :salary (:salary p)]])))
(d/empty-db schema)
data)))
(defn doxa-add-1 [data]
(enc/qb 1
(reduce
(fn [db p]
(dx/commit db [[:dx/put p]]))
{}
data)))
;; result in ms
[(datascript-add-1 people50k) (doxa-add-1 people50k)]
;; clj => [1520.5 166.99]
add all data in single transaction
(defn datascript-add-all []
(enc/qb 1
(d/db-with (d/empty-db schema) people50k)))
(defn doxa-add-all []
(enc/qb 1
(->> (into []
(map (fn [p] [:dx/put p]))
people50k)
(dx/commit {}))))
[(datascript-add-all) (doxa-add-all)]
;; clj => [1483.59 42.56]
query
can we be faster than datascript? yes!
(def db100k
(d/db-with (d/empty-db)
(mapv
(fn [m]
(reduce-kv
(fn [acc k v]
(if (= :id (name k))
(assoc acc :db/id v)
(assoc acc k v)))
{}
m))
data100k)))
(def dx100k (dx/create-dx data100k))
(defn datascript-query []
(enc/qb 1e1
(d/q '[:find ?e
:where
[?e :name "Avocado"]
[?e :price ?price]
[(< ?price 50)]]
db100k)))
(defn dx-query []
(enc/qb 1e1
(dx/q [:find ?e
:where
[?e :name "Avocado"]
[?e :price ?price]
[(< ?price 50)]]
dx100k)))
(defn fast-dx-query []
(enc/qb 1e1
(dx/q [:find ?e
:in ?table
:where
[?table ?e :name "Avocado"]
[?table ?e :price ?price]
[(< ?price 50)]]
dx100k :fruit/id)))
[(datascript-query) (dx-query) (fast-dx-query)]
;; clj => [182.98 685.15 71.05]query by one condition
(defn datascript-q1 []
(enc/qb 1
(d/q '[:find ?e
:where [?e :name "Ivan"]]
db100k)))
(defn dx-q1 []
(enc/qb 1
(dx/q [:find ?e
:where [?e :name "Ivan"]]
dx100k)))
[(datascript-q1) (dx-q1)]
;; cljs => [ 9 51]
;; clj => [3.56 13.95]
two conditions
(defn datascript-q2 []
(enc/qb 1e1
(d/q '[:find ?e ?a
:where [?e :name "Ivan"]
[?e :age ?a]]
db100k)))
(defn dx-q2 []
(enc/qb 1e1
(dx/q [ :find [?e ?a]
:where [?e :name "Ivan"]
[?e :age ?a]]
dx100k)))
[(datascript-q2) (dx-q2)]
;; cljs => [ 242 618]
;; clj => [65.51 142.94]
3
(defn datascript-q3 []
(enc/qb 1e1
(d/q '[:find ?e ?a
:where [?e :name "Ivan"]
[?e :age ?a]
[?e :sex :male]]
db100k)))
(defn dx-q3 []
(enc/qb 1e1
(dx/q [:find [?e ?a]
:where [?e :name "Ivan"]
[?e :age ?a]
[?e :sex :male]]
dx100k)))
[(datascript-q3) (dx-q3)]
;; cljs => [ 409 646]
;; clj => [94.34 141.06]
4
(defn datascript-q4 []
(enc/qb 1e1
(d/q '[:find ?e ?l ?a
:where [?e :name "Ivan"]
[?e :last-name ?l]
[?e :age ?a]
[?e :sex :male]]
db100k)))
(defn dx-q4 []
(enc/qb 1e1
(doall
(dx/q [:find [?e ?l ?a]
:where [?e :name "Ivan"]
[?e :last-name ?l]
[?e :age ?a]
[?e :sex :male]]
dx100k))))
[(datascript-q4) (dx-q4)]
;; cljs => [ 588 681]
;; clj => [149.9 142.44]
one pred
(defn datascript-qpred1 []
(enc/qb 1e1
(d/q '[:find ?e ?s
:where [?e :salary ?s]
[(> ?s 50000)]]
db100k)))
(defn dx-qpred1 []
(enc/qb 1e1
(dx/q [:find ?e ?s
:where [?e :salary ?s]
[(> ?s 50000)]]
dx100k)))
[(datascript-qpred1) (dx-qpred1)]
;; cljs => [ 321 959]
;; clj => [93.36 179.29]
pull
one key
(defn datascript-pull1 []
(enc/qb 1e3
(d/pull db100k [:name] (rand-int 20000))))
(defn dx-pull1 []
(enc/qb 1e3
(dx/pull dx100k [:name] [:db/id (rand-int 20000)])))
[(datascript-pull1) (dx-pull1)]
;; cljs => [ 15 8]
;; clj => [12.37 2.23]
entire map
(defn datascript-pull2 []
(enc/qb 1e3
(d/pull db100k ['*] (rand-int 20000))))
(defn dx-pull2 []
(enc/qb 1e3
(dx/pull dx100k [:*] [:db/id (rand-int 20000)])))
[(datascript-pull2) (dx-pull2)]
;; cljs => [ 43 11]
;; clj => [38.52 3.81]
joins
(defn datascript-pull3 []
(enc/qb 1e3
(d/pull db100k [:name {:friend [:name]}] (rand-int 20000))))
(defn dx-pull3 []
(enc/qb 1e3
(dx/pull dx100k [:name {:friend [:name]}] [:db/id (rand-int 20000)])))
[(datascript-pull3) (dx-pull3)]
;; cljs => [ 42 19]
;; clj => [20.63 2.84]
Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
