#
parquet-files
Here are 31 public repositories matching this topic...
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
-
Updated
Apr 14, 2021 - Scala
-
Updated
Feb 10, 2021 - R
A converter for the OSM PBFs to Parquet files
-
Updated
Aug 6, 2020 - Java
Library to read a subset of Parquet files
-
Updated
Feb 13, 2020 - C++
Converts between file formats such as CSV and Parquet
-
Updated
Sep 28, 2017 - C
Price Crawler - Tracking Price Inflation
spark
pandas-dataframe
python3
dash
s3-storage
parquet-files
aws-athena
commoncrawl
petabytes
calculate-inflation-rates
-
Updated
Jun 23, 2020 - Python
-
Updated
Mar 21, 2021 - Jupyter Notebook
Scala code to read Parquet files as streams in Spark Streaming using Avro.
-
Updated
May 5, 2016 - Scala
-
Updated
Feb 21, 2021 - Jupyter Notebook
Merge Parquet Files on S3 with this AWS Lambda Function
-
Updated
Nov 28, 2020 - Python
Explore factors associated with Malware Infection using Spark SQL
-
Updated
Aug 15, 2020 - Jupyter Notebook
A docker image to read parquet files with drill in DataGrip
-
Updated
Aug 8, 2019 - Dockerfile
A lightweight Java library that facilitates reading and writing Apache Parquet files without Hadoop dependencies
-
Updated
Apr 7, 2021 - Java
UniParc dataset describing ~300 million protein sequences converted into relational tables accessible through Google BigQuery (and as Parquet files).
-
Updated
Mar 24, 2021 - Rust
Read music app sparkify data from s3 and perform transformation in Spark and save the results into Parquet files.
-
Updated
Sep 23, 2019 - Jupyter Notebook
A light-weight command-line tool to browse and query CSV, Excel and Apache Parquet files, regardless of their size.
-
Updated
Jan 9, 2021 - Python
-
Updated
Jun 27, 2018 - Scala
Udacity Data Engeneering Nanodegree Program - My Submission of Project: Data Lake
-
Updated
Mar 27, 2021 - Python
Load data from the Million Song Dataset into a final dimensional model stored in S3.
-
Updated
May 17, 2020 - Python
Data Engineering project on how to build Data Lake on S3 using Chicago Taxi Dataset
-
Updated
Jun 20, 2020 - Jupyter Notebook
A web application for viewing Apache Parquet files . This is a Python + Flask application
-
Updated
Apr 17, 2018 - HTML
Upstream classifier image preprocessing
-
Updated
Feb 24, 2021 - Jupyter Notebook
cassandra database to parquet file
-
Updated
Oct 31, 2020 - Python
Practice of Python skill
json
pandas-dataframe
pandas
hackerrank
pyspark
kafka-consumer
xml-parser
kafka-producer
parquet-files
google-address-validation
pandas-python
-
Updated
Jan 26, 2021 - Python
Improve this page
Add a description, image, and links to the parquet-files topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the parquet-files topic, visit your repo's landing page and select "manage topics."


Hello everyone,
Recently I tried to set up petastorm on my company's hadoop cluster.
However as the cluster uses Kerberos for authentication using petastorm failed.
I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.
I hacked "petastorm/petastorm/hdfs/namenode.py" line 250
and replaced it with