The Wayback Machine - https://web.archive.org/web/20200914113440/https://github.com/v3io/parquez
Skip to content
k8s
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
sh
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Parquez

A mechanism for storing fresh/hot data in the NoSQL database and historical data on Parquet/orc while providing a single access for users (via a view) for easier access to real time and historical data

The view will be created in Presto based on Hive & V3IO KV Once the user creates the view an automated job is created by the interval given: Job creates the view Job deletes the old KV partitions & the old parquet files Job will be running on the App nodes Job is based on crontab

Users will be able to create a view for the “parquez” table using a script Rest call .

script parameters

view-name : The unified view name (parquet and kv)
partition-by [h / d / M / y] : only time based partition is supported in this phase
partition-interval [1-24h / 1-31d / 1-12M / 1-Ny] : Partition creation interval .
real-time-table-name : The KV table for the view, need to specify the full path)
real-time-window window [h / d / M / y] : The time window for storing data in key value (hot data)
historical-retention [h / d / M / y] : The retention of all parquez data
config : config file path

config file parametres

[v3io]
v3io_container = bigdata
access_key = <access_key>

[hive]
hive_schema = default

[presto]
uri = <pesto_uri>
v3io_connector = v3io
hive_connector = hive

[nginx]
v3io_api_endpoint_host = <v3io_api_endpoint_host>
v3io_api_endpoint_port = 443
username = <user_name>
password =

[compression]
type = Parquet
coalesce = 6

#set environment to k8s/vanilla
[environment]
type = k8s

Prerequisites

  1. parquez scripts
  2. partitioned kv table

Building / deploying the functions

Clone this repository and cd into it:

mkdir parquez && \
    git clone https://github.com/iguazio/parquez.git && \
    cd parquez

Run the parquez

./run_parquez.sh --view-name parquezView --partition-by h --partition-interval 1h --real-time-window 3h \
--historical-retention 21h --real-time-table-name table_name --config config/parquez.ini

About

No description, website, or topics provided.

Resources

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.