parquet
Here are 179 public repositories matching this topic...
Hello everyone,
Recently I tried to set up petastorm on my company's hadoop cluster.
However as the cluster uses Kerberos for authentication using petastorm failed.
I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.
I hacked "petastorm/petastorm/hdfs/namenode.py" line 250
and replaced it with
driver = 'libhdfs'
return pyarrow.hdfs.c-
Updated
Dec 12, 2020 - Jupyter Notebook
-
Updated
Jul 29, 2020 - JavaScript
Currently, there isn't a way to get the table properties in the SparkOrcWriter via the WriterFactory.
-
Updated
Dec 11, 2020 - Python
-
Updated
Dec 1, 2020 - Python
Over time we've had some things leak into the diff methods that make it more cumbersome to use BigDiffy via code instead of CLI.
For example diffAvro here https://github.com/spotify/ratatool/blob/master/ratatool-diffy/src/main/scala/com/spotify/ratatool/diffy/BigDiffy.scala#L284
User has to manually pass in schema otherwise we they receive a non-informative error regarding null schema, add
-
Updated
Nov 6, 2020 - C#
-
Updated
Dec 4, 2020 - Python
-
Updated
Oct 14, 2020 - JavaScript
-
Updated
Feb 1, 2019 - TypeScript
-
Updated
Nov 11, 2020 - C++
Problem description
Our dask update graphs are not properly optimized.
We ussually use dask.dataframe optimization and set ave_width=repartition_ratio for kartothek.io.dask.dataframe.update_dataset_from_ddf graphs. We should return an optimized graph from update_dataset_from_ddf to make our users' life simple.
We already have code that does this, whoever picks this up can ping me
-
Updated
Dec 11, 2020 - C#
-
Updated
Nov 29, 2020 - Scala
-
Updated
Mar 5, 2020 - Scala
-
Updated
Dec 11, 2020 - Python
-
Updated
Dec 8, 2020 - Go
-
Updated
Nov 5, 2020 - Java
Improve this page
Add a description, image, and links to the parquet topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the parquet topic, visit your repo's landing page and select "manage topics."


Append
classto allHashCodeBuildersin Gaffer for the below issue to minimise hash collisions.