v3io / parquez

Parquez

A mechanism for storing fresh/hot data in the NoSQL database and historical data on Parquet/orc while providing a single access for users (via a view) for easier access to real time and historical data

The view will be created in Presto based on Hive & V3IO KV Once the user creates the view an automated job is created by the interval given: Job creates the view Job deletes the old KV partitions & the old parquet files Job will be running on the App nodes Job is based on crontab

Users will be able to create a view for the “parquez” table using a script Rest call .

script parameters

view-name : The unified view name (parquet and kv)
partition-by [h / d / M / y] : only time based partition is supported in this phase
partition-interval [1-24h / 1-31d / 1-12M / 1-Ny] : Partition creation interval .
real-time-table-name : The KV table for the view, need to specify the full path)
real-time-window window [h / d / M / y] : The time window for storing data in key value (hot data)
historical-retention [h / d / M / y] : The retention of all parquez data
config : config file path

config file parametres

[v3io]
v3io_container = bigdata
access_key = <access_key>

[hive]
hive_schema = default

[presto]
uri = <pesto_uri>
v3io_connector = v3io
hive_connector = hive

[nginx]
v3io_api_endpoint_host = <v3io_api_endpoint_host>
v3io_api_endpoint_port = 443
username = <user_name>
password =

[compression]
type = Parquet
coalesce = 6

#set environment to k8s/vanilla
[environment]
type = k8s

Prerequisites

parquez scripts
partitioned kv table

Building / deploying the functions

Clone this repository and cd into it:

mkdir parquez && \
    git clone https://github.com/iguazio/parquez.git && \
    cd parquez

Run the parquez

./run_parquez.sh --view-name parquezView --partition-by h --partition-interval 1h --real-time-window 3h \
--historical-retention 21h --real-time-table-name table_name --config config/parquez.ini

Aug	SEP	Oct
	14
2019	2020	2021

v3io / parquez

README.md

Parquez

script parameters

config file parametres

Prerequisites

Building / deploying the functions

About

Releases

Packages

Languages

v3io / parquez

Join GitHub today

Clone with HTTPS

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio

Latest commit

Git stats

Files

README.md

Parquez

script parameters

config file parametres

Prerequisites

Building / deploying the functions

About

Resources

Releases

Packages 0

Languages

Packages