MemPool
Simple distributed datastore that supports custom serialization, spilling least recently used data to disk and memory-mapping.
Usage
addprocs(4)
using MemPool
@everywhere MemPool.max_memsize[] = 10^9 # 1 GB per workerThis sets the memory limit on each process to 10^9 bytes (1GB). If this is exceeded, the least recently used data will be written to disk using movetodisk described below until the total pool size is below 1 GB. Data thus spilled are written in a directory called .mempool. The data can be read back with memory mapping. Overriding mmwrite and mmread described in the next section is recommended for efficiency.
Data store functions:
poolset(x::Any, pid=myid()): store the objectxonpid. Returns aDRefobject.poolget(r::DRef): gets the data stored atDRef. If the data has been moved to disk, it will be read on the caller side.pooldelete(r::DRef): removes data atr, including any data on disk, that was not saved usingsavetodisk.movetodisk(r::DRef): moves data to disk and release it from memory. UsesMemPool.mmwriteto write to disk. See section below. Returns aFileRefwhich can be passed topoolgetto read the data. Furtherpoolgetcalls toritself will cause the data to be read from disk and cached in memory and marked most recently used.copytodisk(r::DRef): copies data to disk keeping the original copy in memory. Subsequentpoolget(r)will read data from disk on callee process, or return the cached value if the callee owns the ref.savetodisk(r::DRef, path): saves data to a given file path. Leaves original data in memory, doesn't affect LRU accounting. Use this when you want to explicitly save data using the format described below.
MemPool.mmwrite, MemPool.mmread
mmwrite and mmread are fast alternatives to Base.serialize and Base.deserialize which can memory map if read from disk. They fallback to Base.serialize so as to support all Julia types. This format is only suitable for temporary storage since all four functions can change implementations.
mmwrite(s::AbstractSerializer, x::Any)is called to write data to the wire / file when data needs to be transferred / written to disk. Packages can define how parts of their datastructure can be written in raw format that can be mmapped back later withmmread.mmwritemust begin with the commandBase.serialize_type{MemPool.MMSer{typeof(x)}so that Julia's base serializer will dispatch any deserialization tommread.mmread(::Type{T}, io::AbstractSerializer)is called to deserialize data written withmmwrite.
mmwrite can currently store Array{String} much more efficiently than Base. It is also extended for fast storage of NullableArrays, PooledArrays, and IndexedTables by JuliaDB.

