parquet-files

Hello everyone,
Recently I tried to set up petastorm on my company's hadoop cluster.
However as the cluster uses Kerberos for authentication using petastorm failed.
I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.

I hacked "petastorm/petastorm/hdfs/namenode.py" line 250
and replaced it with

driver = 'libhdfs'
return pyarrow.hdfs.c

Mar	APR	May
	15
2020	2021	2022

parquet-files

Here are 31 public repositories matching this topic...

uber / petastorm

Full Support for Kerberos secured Hadoop Cluster

Call stack is not reported if an error occurs on a worker thread (when using workers_pool.ThreadPool)

petastorm-generate-metadata.py cannot locate unischema class due to unexpected working directory

Cinchoo / ChoETL

mjakubowski84 / parquet4s

hrbrmstr / sergeant

minio / spark-select

adrianulbona / osm-parquetizer

hannesmuehleisen / miniparquet

renesugar / FileConvert

uhussain / WebCrawlerForOnlineInflation

hrbrmstr / sergeant-caffeinated

adrigrillo / NYCSparkTaxi

Foroozani / BigData_PySpark

gpapag / spark-streaming-parquet

EnsleyEC / parquet-file-concepts

m-kwiedor / lambda-merge-parquet

sudip-padhye / EDA-of-Malware-Infected-Devices-using-PySpark

mschermann / docker_apache_drill_datagrip

strategicblue / parquet-floor

ostrokach / uniparc_xml_parser

shilpamadini / DWH-S3-Spark

IgnacioMB / csvcli

guda249 / spark-parque

FutureTroglodyte / udacity-nd027-data_lake

rigganni / AWS-Spark-Million-Song-ETL

bdnf / Building-DataLake-with-Spark-and-S3

samvel1024 / csv2pq

vipinc007 / ParquetViewer

EtienneLardeur / P8_FruitsRecognition

effulgenz-emp / data_pull

ankhipaul / python_demos

Improve this page

Add this topic to your repo