Parquez
A mechanism for storing fresh/hot data in the NoSQL database and historical data on Parquet/orc while providing a single access for users (via a view) for easier access to real time and historical data
The view will be created in Presto based on Hive & V3IO KV Once the user creates the view an automated job is created by the interval given: Job creates the view Job deletes the old KV partitions & the old parquet files Job will be running on the App nodes Job is based on crontab
Users will be able to create a view for the “parquez” table using a script Rest call .
script parameters
view-name : The unified view name (parquet and kv)
partition-by [h / d / M / y] : only time based partition is supported in this phase
partition-interval [1-24h / 1-31d / 1-12M / 1-Ny] : Partition creation interval .
real-time-table-name : The KV table for the view, need to specify the full path)
real-time-window window [h / d / M / y] : The time window for storing data in key value (hot data)
historical-retention [h / d / M / y] : The retention of all parquez data
config : config file path
config file parametres
[v3io]
v3io_container = bigdata
access_key = <access_key>
[hive]
hive_schema = default
[presto]
uri = <pesto_uri>
v3io_connector = v3io
hive_connector = hive
[nginx]
v3io_api_endpoint_host = <v3io_api_endpoint_host>
v3io_api_endpoint_port = 443
username = <user_name>
password =
[compression]
type = Parquet
coalesce = 6
#set environment to k8s/vanilla
[environment]
type = k8s
Prerequisites
- parquez scripts
- partitioned kv table
Building / deploying the functions
Clone this repository and cd into it:
mkdir parquez && \
git clone https://github.com/iguazio/parquez.git && \
cd parquezRun the parquez
./run_parquez.sh --view-name parquezView --partition-by h --partition-interval 1h --real-time-window 3h \
--historical-retention 21h --real-time-table-name table_name --config config/parquez.ini

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
